Yield components detection and image-based indicators for non-invasive grapevine yield prediction at different phenological phases

Forecasting vineyard yield with accuracy is one of the most important trends of research in viticulture today. Conventional methods for yield forecasting are manual, require a lot of labour and resources and are often destructive. Recently, image-analysis approaches have been explored to address this issue. Many of these approaches encompass cameras deployed on ground platforms that collect images in proximal range, on-the-go. As the platform moves, yield components and other image-based indicators are detected and counted to perform yield estimations. However, in most situations, when image acquisition is done in non-disturbed canopies, a high fraction of yield components is occluded. The present work’s goal is twofold. Firstly, to evaluate yield components’ visibility in natural conditions throughout the grapevine’s phenological stages. Secondly, to explore single bunch images taken in lab conditions to obtain the best visible bunch attributes to use as yield indicators. In three vineyard plots of red (Syrah) and white varieties (Arinto and Encruzado), several canopy 1 m segments were imaged using the robotic platform Vinbot. Images were collected from winter bud stage until harvest and yield components were counted in the images as well as in the field. At pea-sized berries, veraison and full maturation stages, a bunch sample was collected and brought to lab conditions for detailed assessments at a bunch scale. At early stages, all varieties showed good visibility of spurs and shoots, however, the number of shoots was only highly and significantly correlated with the yield for the variety Syrah. Inflorescence and bunch occlusion reached high percentages, above 50 %. In lab conditions, among the several bunch attributes studied, bunch volume and bunch projected area showed the highest correlation coefficients with yield. In field conditions, using non-defoliated vines, the bunch projected area of visible bunches presented high and significant correlation coefficients with yield, regardless of the fruit’s occlusion. Our results show that counting yield components with image analysis in non-defoliated vines may be insufficient for accurate yield estimation. On the other hand, using bunch projected area as a predictor can be the best option to achieve that goal, even with high levels of occlusion.


INTRODUCTION
Vineyard yield forecast is a fundamental research subject in viticulture today. If done early in the 79season it can bring many advantages such as knowing in advance the amount of machinery and manpower needed for harvest, allocating cellar space and equipment, managing wine stock and grape prices as well as better developing marketing strategies (Dunn and Martin, 2004). However, predicting grapevine yield is very challenging because, in general, it presents a high inter-annual and spatial variability due to the effects of several factors like soil and climate conditions, grapevine variety, biotic and abiotic stresses, vineyard management practices, among others (Clingeleffer, 2001;Bramley and Hamilton, 2004;Taylor et al., 2018).
Several methods have been proposed for vineyard yield estimation. For them to be effectively advantageous for the grower, their relative error should range between 5 %  and 10 % (Carrillo et al., 2016) of the actual yield. At the regional level, agroclimatic models (Fraga and Santos, 2017;Sirsat, Mendes-Moreira, Ferreira and Cunha, 2019) and aeropalynological methods (Besselat, 1987;Cunha et al., 2016), have been explored with relative success. The last, based on grapevine pollen, is widely used across the world, however, such methods are limited to large areas or regions because climatic data has a small spatial resolution and pollen grains transported by the wind can come from a highly unpredictable range of places and distances, not being site-specific. At the vineyard level, Tarara et al. (2014) proposed the use of tension sensors connected to the trellis for measuring the tension caused by the increasing bunch weight. The obtained results presented strong linear relationships with final yield. However, this method needs a high density of sensors and a high quality and well-maintained supporting wire infrastructure.
Regardless of alternative methods, yield estimation is still widely performed in conventional ways, based on the manual sampling of yield components (YC) from vine segments in the field . The YC are determined in both the previous and the current season (Tassie and Freeman, 1992) and, when multiplied together they compute the total yield of a single vine, as shown in Equation 1 (Coombe and Dry, 2001). The yield per ha is the product of the yield per vine with planting density.
(Eq. 1) The number of shoots per vine is largely determined by the number of buds left at winter pruning. The relationship between the number of buds and shoots with yield is highly variable and dependent on vineyard management and variety. The number of bunches per shoot is defined by the number of inflorescences initiated in the buds during the previous growing season (Vasconcelos et al., 2009). Berry number depends on the percentage of flowers that set into fruit (Keller, 2010). After fruit set, berries start to grow until a lag phase, just before veraison stage (BBCH stage 81, Lorenz et al., 1995), and then again until harvest (Coombe and McCarthy, 2000). This growth is largely dependent on climatic conditions and water availability (Ojeda et al., 2001). The final mass of each bunch is determined by the number of berries and their individual mass. If the harvest is performed manually then the rachis weight should also be considered. The number of bunches and berries alone explains about 60 % and 30 % of seasonal yield variation, respectively (Clingeleffer, 2001).
Although some of the above mentioned YC are only defined at more advanced phenological stages, yield predictions can be performed at earlier stages, if historical data is used to fulfil the remaining variables of the equation. As an example, a yield prediction can be made at budburst by knowing the real number of buds and shoots and multiplying them with previous season's average number of bunches per shoot, berries per bunch and berry weight. Early forecasts can help vineyard managers to timely adapt their strategies to adjust their production in that season, for example by planning bunch thinning with reliable information (Dunn and Martin, 2004). However, a forecast close to harvest has a smaller chance of being inaccurate due to the negative effects of climatic events and other biotic and abiotic stresses, as the time window for them to occur is smaller. Furthermore, the closer the forecast is to harvest, the lesser need for historical data to be used, as grapes reach their full development.
The YC are determine 5 Classical yield estimates are often done around berry lag phase (Clingeleffer, 2001), which can occur between 1 to 2 months before harvest. Such methods involve collecting samples of bunches from vine segments and weighing them to extrapolate the information across the whole vineyard. Then, bunch weight at harvest is estimated using Equation 2 (Clingeleffer, 2001).
This equation requires an extra variable (weight gain) which is defined as the ratio of the average bunch weight at harvest (historical data) to the average bunch weight at lag-phase. The simplicity of this method makes it achievable by any producer and is still today one of the most used in commercial vineyards (Hacking et al., 2019). However, this methodology, besides being costly and labour-dependent, its accuracy is very dependent on the weight gain factor, the number of samples and the analyzed area (larger vineyards require more samples). In many cases, growers and vineyard managers are not willing to commit enough resources for statistically correct sampling, and yield estimations are often inaccurate (Dunn and Martin, 2004). Today, new sampling strategies are often applied which can increase the quality of this method's results (Araya-Alman et al., 2019;Uribeetxebarria et al., 2019), however, they still do not completely account for its destructive and laborious problems.
With the recent development of image analysis technologies, there has been a big research effort in developing proximal sensor-based methodologies to address the challenge of yield estimation. These methods can potentially assess large areas with high image resolution, nondestructively and with georeferenced output (Gatti et al., 2016). Such systems acquire large amounts of data, mostly in the format of digital images, which need to be thoroughly processed. This is today possible due to the development of powerful image analysis systems that can automatically recognize objects in images using digital image processing and machine learning methods (Cristianini and Shawe-Taylor, 2000;Jensen, 1996;Lecun et al., 2015). In several of these works, YC such as shoots , flowers Millan et al., 2017), bunches (Dunn and Martin, 2004;Reis et al., 2012) or berries Nuske et al., 2011) were detected automatically, while Weight/bunch (harvest) = weight/bunch (veraison) x weight gain others search for image traits (bunch projected area, bunch volume, etc.) that can be used as proxies to estimate YC such as bunch weight (Hacking et al., 2019). These modern methodologies are feasible and becoming increasingly accurate at detecting yield components (Aquino et al., 2018a, Di Gennaro et al., 2019, Nuske et al., 2014Seng et al., 2018).
Just like classical manual methods, proximal image-based yield estimation also relies on counting YC and can be performed at different phenological stages. However, image-based methods are dependent on the visibility of these YC, especially when performed non-invasively, as YC are commonly less exposed to the sensors. YC visibility can be influenced by many factors such as the size and quantity of the YC, the canopy development (nº shoots/m, leaf area/shoot), the position of the YC in the canopy and its colour and shape. These factors are largely influenced by the grape variety (vigour, internode length, leaf size, bunch size and compactness, yield potential), the training system and the phenological stage.
A very early yield forecast was attempted by Liu et al. (2017) where yield maps were computed based on automatic shoot counts from groundbased grapevine images. Images were collected between the beginning of budburst and two leaves unfolded (BBCH phenological stages 7 and 12, respectively) phases when the shoots visibility was highest. Other variables, besides the ones mentioned in equation 1, were used to turn the number of recorded shoots into yield, including the proportion of rachis weight per bunch and the harvester efficiency factor. The authors have obtained a very early yield prediction with an error ranging from 1 % (Shiraz plot) to 36 % (Chardonnay plot).
Other works approach yield estimation similarly but at more advanced phenological stages and a smaller scale. Liu et al. (2018) and Tello et al. (2019) developed automatic algorithms that successfully detect the number of flowers per inflorescence. Because of the small size of this YC, they had to collect the images in close range (~50 cm distance between the camera and the YC) and with black cardboard in the background while holding the inflorescence. Such methods need further development to be performed in a swift image acquisition manner if to be applied in a fully automatic system at a vineyard scale. Rudolph et al. (2019) collected high definition images at vine level (1 m range from the grapevine) and applied machine learning algorithms to automatically detect the number of visible inflorescences and the number of flowers in each inflorescence. Flowers were detected with high precision within the previously detected inflorescences, however, the authors underlined problems with the occlusion of some inflorescences by vegetation. Abdelghafour et al. (2019) also performed inflorescence detection at close range with high accuracy. The authors used a mobile ground platform for image collection and a Bayesian framework with a colour based pixel-wise algorithm for image classification. In these works, the authors focused on evaluating the detecting algorithm and did not perform yield estimation, however it serves as an important first step towards that goal.
Regarding berries, this YC is possible to detect right after fruit set. However, without highresolution images, it can be challenging to detect berries at this initial stage because they are too small. As berries grow, their detection can get easier, however as they become larger, they also start occluding neighbouring berries, thus diminishing their own visibility. Similarly to the works on flowers, Aquino et al. (2018b) developed a smartphone application for counting visible berries on single-bunch images using a homogeneous background. With similar technology, Aquino et al. (2018a) performed berry counting on-field images, at a vine level, and performed accurate yield estimates on defoliated vines. This is similar to what was previously achieved by Nuske et al. (2014) but performed at earlier stages (between pea-sized berries and berry touch).
Berry detection has been reported as being easier to achieve automatically than bunch detection because berries have a less variable size and are clustered together (Zabawa et al., 2019). However, considering that bunch number is a YC that explains a higher percentage of yield variation (Clingeleffer, 2001;Pérez-Zavala et al., 2018), several researchers tried to use image analysis for general grape pixel detection and full bunch segmentation (Luo et al., 2016(Luo et al., , 2018Milella et al., 2019;Škrabánek and Majerík, 2017;Śkrabánek, 2018;Xiong et al., 2018). When doing so, to perform yield estimation, segmented bunches (in pixels) have to be converted into weight (Lopes et al., 2016). Hacking et al. (2019), attempted to estimate bunch weight from bunch projected area and volume, with 2D and 3D images, respectively. The experiment was performed at the bunch and vine level, in both lab and field conditions. Bunch volume was reported as the best explanatory variable of bunch weight on single bunch images, in lab conditions (full bunch visibility). However, when the same approach was attempted in field conditions, results showed that bunch projected area in 2D images outperformed the 3D alternative.
Regardless of the YC, which is intended to be detected, YC visibility is also dependent on the image collecting system and methodology. Some of the researchers reported above (e.g., Hacking et al., 2019;Nuske et al., 2011;Rudolph et al., 2019;Škrabánek and Majerík, 2017) based their image collection methods on static systems in field or lab conditions. Other works used moving platforms, Kicherer et al., 2015;Lopes et al., 2017;Millan et al., 2019;Zabawa et al., 2019). In all cases, when images were collected at a vine level and in nondisturbed canopies (not defoliated), authors mentioned YC occlusion, mainly caused by leaves, as the main challenge for yield estimation (Aquino et al., 2018b;Hacking et al., 2019;Nuske et al., 2014;Rudolph et al., 2019). Only in the case of shoot detection, when the grapevine vegetative development is still at an early stage, is this occlusion not stated as a problem. To increase YC exposure, authors that performed yield estimation at later stages opted for partial or total leaf removal prior to image acquisition. However, although leaf removal is generally used in cool climate viticulture, in warmer regions this technique has to be performed with caution as high temperatures can induce berry sunburn, damaging grapes and decreasing yield and must quality (Krasnow et al., 2010).
The main objective of this work is to contribute to the decision regarding which variables to look for when approaching grapevine yield estimation using image analysis with a ground platform. For that, two specific goals were set. Firstly, to study the magnitude of YC occlusion in natural vineyard conditions to evaluate the degree of visibility of each YC throughout the growing cycle. Secondly, to explore image-based attributes, after berry set, to study the best predictors of bunch weight in single bunch images and of yield in non-defoliated grapevine field images. Gonçalo Victorino et al.

Plant material and growth conditions
The experiment was carried out in two adult experimental vineyard plots located at Tapada da Ajuda, Lisbon (38°42'24.61» N 9°11'05.53» W). Both vineyards have spur pruned vines trained on a vertical shoot positioning trellis system with two pairs of movable wires. The first vineyard plot consists of two drip-irrigated white varieties ('Encruzado' and 'Arinto') grafted onto 1103 Paulsen rootstock, planted in 2006 and spaced 1.0 m within and 2.5 m between north-south oriented rows. Full irrigation (~100 % crop evapotranspiration) was insured until veraison which was then replaced by mild water stress conditions (~50 % crop evapotranspiration) until harvest. The second vineyard plot is rainfed andconsists of the cultivar Syrah (grafted onto 140 Ruggery rootstocks) planted in 1999 and spaced 1.2 m within and 2.5 m between northsouth oriented rows. In this vineyard, after mid ripening, some basal leaf senescence was observed, a consequence of the typical mild water stress of these ecological conditions. Data were collected during 2018 ('Encruzado' and 'Syrah') and 2019 ('Encruzado', 'Syrah' and 'Arinto') seasons. To encompass as much spatial variability as possible within the vineyard plot, four (2019) to six (2018) smart points were chosen and labelled along each vineyard plot. A total of 240 one-meter vine segments (~ equivalent to 1 vine canopy length) were labelled and assessed in both seasons.
On both plots, the soil is a clay loam with approximately 1.6 % organic matter and a pH of 7.8 (Teixeira et al., 2018). Regarding climatic conditions the 2018 season was characterized by a wet spring (~300 mm of precipitation from March to June) and a warm and dry summer (~15 mm of precipitation and average mean temperature of 22.1°C, from June to September), whereas the 2019 season presented a drier spring (~119 mm of precipitation from March to June) and a dry and warm summer (~17 mm of precipitation and average mean temperature of 20.6°C, from June to September). In both seasons, all vineyard plots were subject to similar standard cultural practices during the growing cycle, with canopy management consisting in de-suckering, shoot positioning and shoot trimming. No defoliation was performed.

Image acquisition and processing
For the online data collection, a robot platform developed on the frame of the EU project Vinbot (http://www.vinbot.eu/) was used. The robot carries an RGB-D Kinect v2 camera (Kinect v2.0, Microsoft) and two 2D laser range finders, one for navigation and one to obtain data regarding canopy shape (Guzman et al., 2016;Lopes et al., 2017). All images were collected from non-manipulated vines using the robot's camera, mounted in a lateral perspective ( Figure 1). Images were taken of the sunlit side of the canopy, at a distance varying from 0.70 to 1.0 m. To collect images from the fruiting zone, FIGURE 1. View of the Vinbot platform in action in field conditions. the robot camera was positioned at approximately 1 m above the ground.
The number of spurs and yield components were manually assessed in each image (estimated data) at each phenological stage. Furthermore, at pea-sized berries, veraison and harvest stages, the total bunch projected area per vine segment was computed. Ground-truth (observed data) was assessed visually in the field at the same phenological stages-YCs were manually determined for each of the canopy segments previously imaged. At harvest, all bunches per vine segment were picked, counted and weighed, per meter. As spurs had about the same number of buds (2 buds/spur), for simplicity the variable number of spurs was analyzed as a yield component as it is a proxy of the number of nodes per vine.
At pea size, veraison and harvest, after data collection in the field, a sample of ~80 bunches per variety was collected from outside of the smart points and taken into lab conditions for further assessments. In the lab, two images were collected per bunch, in perpendicular perspectives, using a standard commercial camera (Nikon D5200) ( Figure 4A). These images were then used to compute the projected area, perimeter and maximum length of each bunch. Total bunch weight was measured using a table scale (KERN FCB v1.4). Bunch volume was acquired using the water displacement method. Berries were then destemmed and placed separately on a table for image collection ( Figure 4B). ImageJ's built-in option for analyzing particles was then used to automatically count the number of berries. Furthermore, at the same phenological stages, bunch contours in the images were manually labelled using a standard labelling software (ImageJ®) to estimate the field visible bunch area.

Data analysis
Correlation analysis was used to evaluate the relationships between yield components and final yield, as well as between bunch and berry data and bunch weight and vine yield. Student's t-test was performed to compare ground-truth and estimated data. The mean absolute percentage error (MAPE) was used to evaluate the error between observations in the field and ground truth for each YC. Throughout the text, this error will be addressed as an occlusion (Eq. 3) indicating the degree of visibility of each YC on images when compared to ground-truthed data.
where y i represents the estimated mean YC count (YC manually counted in the image), y i represents the observed mean YC count (groundtruthed data -YC counted manually in the field) and n the number of pairs. Image analysis was performed using ImageJ® (v1.52e, National Institutes of Health, EUA). Correlation and regression analysis were performed using SAS ® statistical software. Table 1 shows the correlation coefficients between observed yield components and the final yield, for 2019 data.

Correlations between yield components and final yield
The number of spurs presents a very low and positive r with yield, being Arinto the only variety that shows a significant relationship. Regarding the number of shoots, while Arinto and Encruzado showed a lower and nonsignificant r, Syrah presents a high and significantly positive r with yield. In all varieties, the number of inflorescences and bunches was significantly correlated with yield, with the highest r being observed on the number of bunches of the variety Syrah.  Figure 5 shows the occlusion (Eq. 3) between ground-truth and estimated YC counts on the three varieties, at different phenological stages. The error increased from the winter bud stage until the inflorescences visible stage. Then, the error remained relatively stable until veraison, when it dropped slightly from that stage until harvest. This drop was considerably higher for the Syrah variety while the Arinto variety presented a second drop at the flowers separated stage.  varied slightly throughout the three stages, especially for the Encruzado variety, while bunch maximum length remained relatively stable for all cases. As for the bunch perimeter, it decreased from pea-sized berries to veraison for Arinto and Syrah varieties and then remained stable until harvest. Table 4 shows the correlation coefficients between bunch attributes and bunch weight, assessed in lab conditions. Data was collected using image analysis, except for bunch volume.

Image-based bunch attributes
Bunch projected area, number of berries per bunch and bunch volume all showed positive, high and significant correlation coefficients with bunch weight at all stages and varieties. The same happened with bunch max length, except for the Encruzado variety at the veraison stage, which presented a slightly lower r-value. Bunch perimeter was also positively correlated with bunch weight with a high range of r values, especially high for the Arinto variety at veraison and near harvest. Table 5 presents the average values for visible bunch area (estimated by manual labelling using the ImageJ software) and yield, in field conditions, per vine-segment (one linear meter of canopy length) as well as the determination coefficients (R 2 ) and the Root Mean Square Error (RMSE) of the linear regression between yield (dependent variable) and visible bunch area (independent variable). Table 5 also presents the average bunch occlusion by leaves calculated as the ratio between the visible bunch area before and after full defoliation.
Average visible bunch area in field conditions increased from pea-sized berries until harvest for all varieties. On the other hand, the average bunch occlusion by leaves remained relatively stable throughout all stages for the varieties Encruzado and Arinto, with slightly higher values at veraison. As for the Syrah variety, this occlusion decreased near harvest. Visible bunch area showed significant determination

Correlations between yield components and final yield
The high and significant correlation coefficients obtained indicate that some of these components if accurately detected, can be used as good predictors for grapevine final yield. Regarding vine spurs, although their correlation with yield was low it was still significant for the Arinto variety, meaning that this variable can potentially provide relevant information very early in the season.
The number of shoots presented varying correlation coefficients across the three varieties, with a positive, high and significant r value only for the Syrah case. These results are in accordance with Liu et al. (2017) where yield was successfully predicted from shoot detection exclusively on the Syrah variety, while for the Chardonnay variety, results were not as good.      The authors stated that the Chardonnay variety presented a higher number of patches containing more than one shoot, while the variety Syrah had more single shoots distanced from one another. Our results show that the significance of r values between the number of shoots and yield are variety dependent.
As expected the number of inflorescences and bunches presented a high and significant correlation with yield as also widely proven by other authors (Clingeleffer, 2001;Coombe and Dry, 2001). The Arinto variety presented a positive correlation coefficient between the number of bunches and yield, but lower than the remaining varieties. This can be related to the large size of some of Arinto's bunches (average weight of 427 g; Table 3).

Yield components occlusion throughout the growing cycle
Yield component's occlusion analysis ( Figure 5) show that the most visible YC along the growing cycle were vine spurs and early-stage shoots which is mostly explained by the lack of vegetation early in the season. As mentioned before, these traits are poorly correlated to grapevine yield, being considered poor variables to solely predict the final yield in general terms. However, as they can be detected with relatively high accuracy, they might be important variables for very early yield estimation attempts (e.g., Syrah variety) or to adjust later yield estimation algorithms. Moreover, shoot visibility, although stable among varieties (average occlusion ranging from 30 % in the variety Encruzado to 42 % in the variety Arinto), appears to not be as good as reported in Liu et al. (2017), where shoot detection achieved higher accuracy rates than our handmade labelling. This might have to do with the image collection timing (phenological stage) and the variability of grapevine development within grapevine segments. In Liu et al. (2017) the authors collected images around the phenological stages of budburst and two leaves unfolded. In the present work, a very high phenology variability was encountered around these stages across and within vine segments, making it challenging to collect images that were representative of the optimal stage for this assessment. Instead, some images presented shoots with 3 and 4 leaves unfolded while some buds had barely burst. Grapevine phenological development has been proven to be spatially variable (Verdugo-Vásquez et al., 2016) as it is affected by many factors such as cooling hours, winter pruning timing and growing degree days (Jones and Davis, 2000). This variability, common in many vineyards, is especially visible at earlier stages, stabilizing later in the season (e.g. Victorino et al., 2017) and was an important limitation during our data collection phase which can make the approach described in Liu et al. (2017) not adjustable to other conditions. Regarding inflorescences, it is important to mention that, in highly vigorous, non-defoliated plants, they can be very hard to detect even by the human eye because of their similarity to the surrounding canopy, a challenge encountered during manual labelling. As for the results, inflorescences were more visible in the Syrah variety (occlusion = 59 %) than in the Arinto (68 %) and Encruzado (64 %) cases. This can also potentially be explained by the fact that this variety's shoots do not cluster together as much as the other varieties . Another potential explanation can be the fact that the Syrah plot is older (and not irrigated) and presented a more heterogeneous spur height relative to the cordon, a consequence of many years of pruning. This caused the shoots to grow at different heights and consequently could have influenced inflorescence occlusion.
At the stage of flowers separated, inflorescence occlusion decreased by almost 20 % for the Arinto variety and remained stable for the remaining varieties. This difference can possibly be explained by the inflorescence size of Arinto, which, at this stage, are wide open with several blank spaces within, thus being easier to detect than other varieties.
Further on, bunch occlusion was highly variable (with high standard errors) for all varieties, results that go in accordance with Aquino et al. (2018a). At the stage of pea-sized berries, the occlusion remained similar to the previous stage for the varieties Encruzado and Syrah while significantly increasing for the variety Arinto, due to this variety's previous occlusion drop. From this stage until harvest, the visibility error remained stable, across all varieties, with a slight decrease near harvest. This decrease at harvest was more evident for the Syrah variety, which can be explained by the fact that this plot was not irrigated and was subject to higher levels of water stress. Consequently, some leaf senescence was observed for this variety, leaving bunches more exposed. Furthermore, Arinto presented higher occlusion values from pea-sized berries until harvest, when compared to the remaining varieties (values close to 70 %, while the other varieties presented values close to 60 %). This can again be related to Arinto's large bunch size and low bud fruitfulness (IVV, 2011), which can cause a higher probability of occlusion. Furthermore, the Arinto variety is also characterized by large and broad leaves that might cause an increased occlusion at more advanced stages.

Image-based bunch attributes
High occlusion values at relevant phenological stages hint that new variables, besides YC counts, should be considered for yield estimation. Out of all analyzed bunch attributes, the increase in bunch area, volume and weight throughout the growing stages was evident and expected as it is caused by bunch and berry size changes from pea-sized berries until harvest. Bunch perimeter decreased from pea-sized berries to veraison, which can be explained by the increased blank spaces within the bunches, at pea-sized berries. Having smaller berries, and because the image analysis method highlights the totality of berry pixels, the perimeter estimation will be higher as it will include the contour of each individual berry instead of a cluster of berries. Thus, this decrease of bunch perimeter is solely due to the image analysis method and not to the size of the bunch.
Regarding the correlation of bunch attributes with bunch weight, as previously stated by Lopes et al. (2016) and Hacking et al. (2019), bunch projected area and bunch volume measured in lab conditions (of mature bunches) highly correlated with bunch weight. Results shown in Table 4 confirm this for all three varieties and show that it is also true at the three studied phenological stages. As for the number of berries, it is widely known that it explains a significant percentage of bunch weight (Clingeleffer, 2001). In fact, several authors used this variable as a predictor of grapevine yield (Diago et al., 2015;Grimm et al., 2018;Millan et al., 2018;Nuske et al., 2014;Zabawa et al., 2019). Results obtained in this work (Table 3) confirm that the number of berries can be used as a good predictor of bunch weight if all berries are visible. With on-the-go yield estimation systems used in natural conditions, not all berries are visible and thus need to be estimated with the use of auxiliary variables or algorithms such as the Boolean model described in Millan et al. (2018).
For all varieties and phenological stages, bunch projected area presented a higher or equal correlation coefficient with bunch weight when compared to the correlation with the number of berries. This contrasts with several works (e.g., Aquino et al., 2018Nuske et al., 2014Grimm et al., 2018;Zabawa et al., 2019) where the berry number is used over bunch projected area. According to our results, the only argument that can still be valid for berries to be considered instead of bunch area is their easiness of being segmented in the image, which was not explored in our work. The bunch projected area is the average area measured on images taken from two perspectives, which is an optimistic approach as it would hardly be achieved in field conditions. However, the variable number of berries is the real total number of berries per bunch and not just the visible ones, making this approach also an optimistic one. Thus, both variables are at even ground in terms of expectancy to be applied in field conditions. In the future, both bunch area and berry number could maybe be considered together for explaining grapevine yield. Not only would each one contribute highly for it individually, but they could also make future algorithms less dependent on variety changes, for example, related to bunch compactness. An index able to combine visible berries and visible bunch area into berries per unit of bunch area could be an asset for the robustness of grapevine yield estimation algorithms in the future.
As previously explored by Hacking et al. (2019), bunch volume is the variable that shows the highest correlation coefficient with bunch weight in lab conditions. However, the same authors showed that in field conditions, this is not the case, and the bunch projected area presents better results. Furthermore, for direct volume estimations, 3D imaging is needed, which requires more demanding technology and image processing.
Regarding the correlation between bunch maximum length and bunch weight, r values were also positive, high and significant. However, in field conditions this variable will most likely not be suitable as only portions of bunches are commonly detected, whereas full bunch detection would be required to compute Gonçalo Victorino et al. their maximum length. As for bunch perimeter, correlation coefficients with bunch weight were also positive and significant in lab conditions. However, very similar technology is required to compute both bunch perimeter and bunch area, the latter having considerately better results, as shown in Table 4.
From all bunch attributes, bunch projected area seems to be the most relevant variable for yield estimation, considering both its correlation with yield and the easiness of data collection, and so was tested in field conditions. With a fully developed canopy and without performing defoliation, average visible bunch projected area increased from pea-sized berries until veraison likely because bunch area increased (Table 3). Despite this, the same was not true for the Syrah variety (in lab conditions), showing no increases in average singular bunch weight while still showing an increase of visible area in field conditions. The average bunch occlusion by leaves decreased slightly at harvest for all cases, with a more evident difference for the Syrah variety (from 79 % to 61 % of bunch area occlusion). Again, this can be explained by the variety's plot not being irrigated and suffering more severe water stress which consequently led to increased leaf senescence. The increase in exposure at cluster zone, in particular for this variety, also explains the increase in field visible bunch area while when in lab conditions the bunch area did not show the same trend. This means that, for this variety, the increase in visible bunch area is mainly due to an increase of exposed bunches and not to an increase of bunch area resulting from growth.
Determination coefficients of the linear regression between visible bunch area and yield were higher at harvest for all varieties. At peasized berries, results are not as satisfactory, showing that, although it would be interesting to have a prediction at this stage, as previously explained, it might be more challenging when using visible bunch area as the main predictor.
As our experiment was performed only in two vineyard plots, the results should not be generalized to different agro-pedo-climatic conditions. In different locations, standard cultural practices can also change, further increasing the possible variability of, for example, YC occlusion and the relationships between bunch weight and other bunch attributes.

CONCLUSION
In this work, the visibility of several yield components was explored in field conditions along the grapevine growing cycle. Furthermore, several image-based bunch attributes were explored in lab and field conditions aiming at finding the best predictors for yield estimation. Regarding yield components' visibility, at dormancy and early shoot growth stages, all varieties showed good visibility of spurs and shoots. Although these components did not show significant correlation coefficients with final yield (except for Syrah), results still suggest that they could be used as auxiliary variables in later estimation algorithms. The number of shoots was highly and significantly correlated with yield only for the variety Syrah, indicating that shoot number might be too dependent on the variety and vine management to be used as the only predictor of yield at budburst.
Both inflorescence and bunch count occlusion rates reached high values well above 50 %, thus other variables should be considered as proxies of final yield besides their counts. Bunch volume and bunch projected area had the highest correlation coefficients with yield in lab conditions, being bunch area the easiest variable to obtain. In field conditions, bunch area occlusion presented high results (above 60 %) at all studied phenological stages, similar to what happened with bunch counts. Lower occlusion values were observed for mild water-stressed Syrah vines which presented some leaf senescence near harvest. Regardless of these high occlusion rates, the visible bunch projected area of non-manipulated grapevine canopies showed significant determination coefficients with yield, proving that, even in dense canopies, bunch area remains a promising explanatory variable of vine yield. Further work is needed to explore these relations in other conditions, as YC visibility can vary widely with vineyard characteristics. Our current research efforts are directed towards performing grapevine yield estimations using visible bunch projected area and other auxiliary variables related to vegetative and reproductive traits that can help to estimate the portion of occluded yield components.