^{ 1 }ITAP, Université de Montpellier, Montpellier SupAgro, INRAE, France

^{ 2 }MISTEA, Université de Montpellier, Montpellier SupAgro, INRAE, France

^{ 3 }Fruition sciences, France

^{ 4 }LIRMM, Université de Montpellier, CNRS, France

^{ 1 }

^{ 1 }

^{ 2 }

^{ 1 }

^{ * }corresponding author: baptiste.oger@hotmail.com

Precise knowledge of field yields is critical for the wine industry, mainly for the logistical organisation of the harvest among other reasons (

This within-field variability affects the quality of estimates resulting from sampling. Indeed, when estimating yield by sampling, the error of estimation (and resulting confidence associated with the estimate) is a function of the number “N” of observations and the variance of the sample (observed variables). For a given field (with a given yield variance), the higher the “N”, the more confident the estimate is, but the longer the time required to carry out the sampling. Note that the sampling time may present high variations depending on the location of the observations over the field. Indeed, the sampling time is directly related to “N”, but also to the time needed to travel from one observation site to another. Optimising sampling time, therefore, requires to optimise both the number “N” of observations according to the field variability and the location of the observation sites to limit the travel time.

In the scientific literature, there are very few papers which have focused on optimizing the location of sampling sites for yield estimation in viticulture. Most of the studies focused on: I) the number “N” of observations to be considered to reach a reliable estimation and to minimise the error of estimation (

As a result, existing studies rarely take into account the two contradictory components leading to an optimal sampling: the optimisation of the sampling effort (time which includes the measurement time and the travel time/distance) and the minimisation of estimation error. For the wine industry, these two components are very important to produce the best possible estimation in the shortest possible time. Without reliable references, the sampling protocols used are often based on rules of thumb and the same protocol is always applied whatever the field,

Recent papers (

Theoretical yield data were generated through a simulation process. This simulation process, described in

For the simulation processes, the magnitude of variation of values for the range and the nugget effect were determined from within field yield observations obtained from yield monitoring systems in precision viticulture (

Theoretical fields were generated by varying only one parameter at a time, with the other two parameters taking their default values. The initial resolution is 1 pixel/m². Yield values are then extracted on the rows assuming a trellised structure with a 2.5 m distance between rows and 1 m between vines on the row (4000 vine plants/ha). Simulations were run with a Gaussian yield distribution with an average yield around 1000 g/vine and a coefficient of variation at 30 %. For each combination, 10 different fields were simulated. The final theoretical dataset consists of 60 (6 × 10) simulated fields.

titre du tableau
Theoretical field yield parameters
Values
Range (m)
25
75
Nugget effect (%)
50
Row length (m) × field width (m)
50 × 200
200 × 50

For each field, the optimal sampling route was obtained by applying the approach described in

Distances were expressed as walking time (min.). Walking times do not consider additional constraints specific to a given field that could alter the walking speed (grass, slope, soil surface conditions, etc.). They only take into account vineyard specificities associated with the trellised structure. It is not possible to move between two rows while being in the field. Going from one inter-row to another implies having to reach one of the field edges. Each measurement site can be accessed from two different inter-rows. This distance also takes into account a starting point where the sampling route must begin and end. It is positioned in the southwest corner of each field. The distance optimized by the solver corresponds to this walking distance that passes through each measurement site and returns to the starting point. This promotes the choice of measurement sites close to the starting point. Common starting points enable a simple comparison of the sampling routes obtained.

To clarify the presentation of the results, two types of sampling routes were considered. The first one corresponded to what is assumed to be most commonly performed by practitioners; this consists in an empirical sampling protocol where measurements are carried out following one round trip within the fields across two, or more, representative rows. Rows are therefore walked from one end to the other, forming a sampling route joining the two sides of the field. This type of route is called thereafter

Sampling routes obtained with the solver were characterised using three criteria: I) The type of sampling route: RBSR or EBSR. II) The walking time required to get from one observation site to another, regardless of the protocol chosen to carry out the measurements and the time associated with these measurements. The time required to make observations (number of clusters, average cluster weight, etc.) at a sampling site may vary depending on the protocol used. However, it was assumed in this work that, for a given situation, the measurement protocol was the same for each sampling site. As a result, for the same number of observation sites, the sampling time was only influenced by the travel distance between the observation sites. Therefore, the walking time depends only on the distance to be covered and the walking speed of the practitioner, which is assumed here to be constant at 0.9 m/s. III) The estimation error corresponds to the difference between the value predicted from measurement sites along the sampling route and the actual average yield of the field. The predicted value (

) is constructed as the average of the N yield observations made during sampling. The actual yield value (

) corresponds to the average of all the simulated yield values of the field. The calculation of the estimation error, expressed as a percentage, is defined by Equation 1.

Figures 1.A, 1.B and 1.C shows the results of optimal sampling routes, either EBSR or RBSR, expressed as estimation errors and walking times for the different field characteristics. Figure 1.D shows results obtained with a simple random sampling which was based on the selection of sites randomly chosen among all the measurement sites without any path optimisation. All the curves share the same logical trends; the estimation errors decrease with an increase in the number N of samples. However, improving the quality of the estimation has a cost since the sampling effort estimated by the “walking time” increases with the number of measurements. A comparison between Figures 1.A and 1.D shows the value of the optimal sampling approach, as proposed in this study, compared to a simple random sampling approach. Sampling optimisation simultaneously improves the estimation error by 5 % to 9 % and reduces the running time by half for the examples considered. Only results obtained for simulated fields with different range were presented (Figure 1.D) for random sampling, but very similar results (results not shown) were obtained for the other simulated fields (row length, nugget effect).

Figure 1 shows that the characteristics of the fields do not affect the optimal sampling route in the same way. The range (Figure 1.A) and, to a lesser extent the row length (Figure 1.B), significantly affect the optimal sampling route, while the proportion of erratic variance in the total yield variance of the field (nugget effect) has a small effect on the optimal sampling route (Figure 1.C). For clarity, Figure 1 does not show the variability resulting from the ten simulations, in the average standard deviation of the results is 1.7 % and 0.6 minutes for the error and the walking time, respectively.

Regarding the range (Figure 1.A), fields with lower ranges (25 m) show lower estimation errors for a given “walking time”. On average, for a range of 25 m, it is possible to achieve an estimation error less than 2 % with an optimal three minutes sampling route with 7 sampling sites, whereas it takes more than 5 minutes (with 9 sampling sites) to achieve the same estimation error for fields with more extensive ranges (50 m and 75 m). In general, the lower the range, the shorter the sampling route and the walking time. Focusing on the length of the rows (Figure 1.B), it is also a factor which affects an optimal sampling route. For short rows, lower estimation errors are achieved with less sampling effort. The effect of nugget effect (Figure 1.C) is less obvious although, larger nugget effects (50 %) are associated with slightly larger estimation errors compared to fields with a low nugget effect (20 %).

Focusing on the range effect, Figure 2 shows examples of sampling routes for three fields with different range values. The three examples share some common features; sampling routes are optimized from the starting point located in the southwest corner of the field (coordinates X = 0, Y = 0). It is clear that the sampling points (and the resulting sampling route) intend to minimize the distance to this starting point for each field. Figure 2 also shows the two types of sampling routes described previously (EBSR or RBSR). The field with the shorter range is associated with an edge-based sampling route (EBSR), while the fields with longer ranges (50 m and 75 m) are associated with a row-based sampling route (RBSR). In this example, for the same number of sampling sites, the optimal route changes with the range.

Field A: Range = 25 m, Nugget effect = 20 %, Row length = 100 m Field B: Range = 50 m, Nugget effect = 20 %, Row length = 100 m Field C: Range = 75 m, Nugget effect = 20 %, Row length = 100 m

Field A: Range = 25 m, Nugget effect = 20 %, Row length = 100 m Field B: Range = 50 m, Nugget effect = 20 %, Row length = 100 m Field C: Range = 75 m, Nugget effect = 20 %, Row length = 100 m

Field A & B: Range = 75 m, Nugget effect = 20 %, Row length = 100 m

Field A & B: Range = 75 m, Nugget effect = 20 %, Row length = 100 m

However, Figure 3 shows that for large ranges, EBSR may also be promoted as an optimal sampling route. In this case, a large range (compared to the dimension of the field) affects the spatial variability of yield which tends to follow a trend (gradient). In practice, this type of spatial distribution may be observed when the yield is driven by an isotropic factor such as the slope, soil depth gradient, water access, etc. In this case, the optimal sampling route is dependent on the relative direction of the rows with the yield gradient. When the yield gradient and the rows present more or less the same direction (Figure 3.B), RBSR is promoted. Indeed, the yield gradient on Figure 3B follows the direction of the row:

Field A: Range = 50 m, Nugget effect = 20 %, Row length = 50 m, N = 5

Field A: Range = 50 m, Nugget effect = 20 %, Row length = 50 m, N = 5

Figure 4 shows the effect of row length on optimal sampling routes across three examples. Fields with short rows are associated with RBSR even with a limited number of sampling sites (N = 5) (Figure 4.A). Conversely, long rows promote EBSR where entire rows are never explored (Figure 4.B and 4.C).

For different field parameters, Figure 5 gives the proportion of sampling strategies corresponding to RBSR against EBSR in the function of the number of sampling sites. Each point of the figure corresponds to the average results over ten simulated fields.

Figure 5.A shows clearly that for fields with short ranges, the optimal sampling route is an EBSR in a large majority. As already seen before, the range has a significant effect on the choice of the optimal sampling strategy and this result is confirmed here over several fields. However, for fields with ranges of 50 m and 75 m, the effect is lessened and the proportion of full row sampling routes (RBSR) reaches a limit; the proportion curve associated with high ranges (75 m) never reaches 100 % of RBSR. This result is explained by simulated fields whose yield gradient is more or less perpendicular to the row direction which promotes EBSR over RBSR (Figure 3).

Figure 5.B also shows clearly the incidence of the length of the row on the best possible sampling route. Long rows always promote EBSR while short row fields always promote RBSR. This result verifies that of Figure 4: when the rows get longer, the optimal sampling strategy always avoids going all along the rows. Exactly the opposite is true for short rows, which is why RBSR is systematically proposed for short rows in this case.

Finally, Figure 4.C shows that a higher proportion of erratic variance (nugget effect) tends to promote EBSR when sampling sites greater than 6.

Results are frequency over ten simulations.

Results are frequency over ten simulations.

In the wine industry, a tendency to adopt the same sampling route for all fields is commonly encountered. However, based on a posteriori knowledge of yield distribution, results exposed in this paper show that the optimal strategy to design a sampling route for grape yield estimation may vary from one field to another in the function of field characteristics. The optimal route sampling seeks to minimize the effort to find sites that are representative of the distribution of yield values. Logically, the lower range of yield reduces the minimum distance to be covered to find two spatially independent sites. Therefore, low ranges make it possible to find a higher variability of yield values in the direct vicinity of the starting point which explains why EBSR is promoted in this case. This also explains why the travel distance decreases with the yield range (Figure 1), EBSR being generally shorter as it does not require to travel twice the length of the rows to find relevant observation sites. The extreme case would be a field with no spatial autocorrelation of yield values (

Note that operational hybrid sampling routes do exist for fields corresponding to more complex configuration. In this case, sampling route is largely based on RBSR which consists in a one-way round trip across two rows with one or more measurement sites coming from a third incompletely covered row added (Figure 2.C or Figure 3.B; which are the same). In contrast, Figure 3.A shows three incompletely-covered rows.

It should be kept in mind that the results of this study are based on simulated data, which represent a simplified version of reality. The errors of estimation exposed here are not indicative of what can be found in practice, the context here is a purely theoretical framework where the spatial distribution of the yield is fully known. For example, it was assumed that for each measurement site, the yield was fully known, as if all bunches of the plant had been weighed. Such a destructive approach is not realistic in a commercial situation because of measurement time and yield loss. In practice, the estimation of the yield on a site is itself the result of a sampling of one or two bunches chosen and weighed by the operator. The result of this process is an error in estimating the yield at each site and a resulting error in estimating the average yield of the field which is necessarily higher than that reported in this work. In this study, uncertainty in the representativeness of the sampling sites is taken into account by the nugget effect which corresponds to erratic variance caused among other things by the error in estimating the yield at each site. However, it remains unclear whether the range of variation chosen for the nugget effect in this work represents the impact of the diversity of yield estimation methods at the level of the measurement site.

Considerations discussed in this study are based on simple field characteristics. This simplified framework enables to identify the impacts of different parameters affecting sampling route. However, the characteristics of a real field are often more complex. For example, rows can have irregular length, fields can have irregular shapes, different sizes etc. Other elements can also affect travel time such as slopes or the presence of a discontinuity in the row structure allowing the practitioner to pass from one row to the other without having to walk all along it. Logistical issues may also count in the sampling route design. The intention of this paper is therefore not to give settled values to be respected but rather guidelines to consider to optimize yield sampling at a lower cost, and effort when information about the yield spatial structure is available. It is thus to be noted that simple and quick field observations such as the row length can be used to instruct the choice towards an EBSR or RBSR strategy. The row length is simple and available information that can be considered without additional cost. This can moreover be achieved without interfering with the decision on the trade-off between estimation error and sampling time, which is left to the practitioner’s discretion. The starting point corresponds here to the fixed entry point for a given field. Its position has an influence on the distance to be covered to reach certain sites. When possible, adjusting its position could reduce the total sampling time. Note that the total sampling time also depends on the measurement time, which is not discussed here as this work focuses on minimising walking time. A proper sampling strategy should consider both walking time minimisation and suited measurement protocol.

Thus, based on the study of yield spatial structure, results shed light on some generic considerations when sampling for grape yield estimation. However, yield spatial structure is generally not known before sampling. Ancillary data

The results obtained in this study are dependent on the sampling strategy used. In this case, this later aimed at selecting measurement sites that are representative of the yield distribution. The choice of this approach may explain why the proposed optimal routes are strongly influenced by the spatial structure of the yield and its organisation with respect to row orientation. However, most targeted sampling methods aim to consider the distribution of the variable to be estimated and may well lead to similar results. This study does not allow us to demonstrate this, however, the proposed methodology may well be used to evaluate sampling methods by simultaneously taking into account: the quality of the estimate made and the sampling effort.

Finally, these considerations on optimal routes for yield sampling may be applied to other variables of interest such as fruit maturation (

This work shows that to be optimal, a sampling route must be tailored to the characteristics of the field. The row length, as well as the spatial organization of the within-field yield variability, are factors that determine the optimality of a sampling route. This work opens up interesting perspectives. Indeed, the approach could be used to identify whether other factors affect the optimal definition of a sampling route (

This work was supported by the French National Research Agency under the Investments for the Future Program, referred to as ANR-16-CONV-0004 (#Digitag).