Two-stage automatic diagnosis of Flavescence Dorée based on proximal imaging and artificial intelligence: a multi-year and multi-variety experimental study

“Flavescence dorée” (FD) is a grape vine disease caused by the bacterial agent “ Candidatus Phytoplasma vitis” and spread by the leafhopper Scaphoideus titanus Ball (Hemiptera: Cicadellidae). The disease is very closely monitored in Europe, as it reduces vine productivity and causes vine death and is also highly transmissible. Currently, the control method used against this disease is a two-pronged approach: i) the spraying of insecticide on a regular basis to kill the vector, and ii) a survey of each row in a vineyard by experts in this disease. Unfortunately, these experts are not able to carry out such a task every year on every vineyard and need an aid for planning their survey. In this study, we propose and evaluate an original automatic method for the detection of FD based on computer vision and artificial intelligence algorithms applied to images acquired by proximal sensing. A two-step approach was used, mimicking an expert’s scouting in the vine rows: (i) the three known isolated symptoms (red or yellow leaves depending on variety, together with a lack of shoot lignification and the presence of desiccated bunches) were detected, (ii) isolated detections were combined to make a diagnosis at image scale; i.e., vine scale. A detection network was used to detect and classify non-healthy leaves into three classes: ‘FD symptomatic leaf’, ‘Esca leaf’ and ‘Confounding leaf’; while a segmentation network was used for the retrieval of FD symptomatic shoots and bunches. Finally, the association of detected symptoms was performed by a RandomForest classifier for diagnosis at the image scale. The experimental evaluation was conducted on more than 1000 images collected from 14 blocks planted with five different grape varieties. The detection of the isolated symptoms achieved a precision of between 0.67 and 0.82 and a recall of between 0.39 and 0.59. The classification at the image scale obtained very good results when applied to images acquired under the same conditions, with the same grape varieties as the training images (precision and recall of more than 0.89). The results of the tests on the other grape varieties show the importance of having some of them in the training base in these AI-based approaches.


INTRODUCTION
Flavescence dorée (FD) is a disease that is very closely monitored in Europe: it was classified as a quarantine disease at the European level in 1993 (European Directive 2000/29/ EC) and is subject to mandatory reporting. This means that when an outbreak of this disease is detected, the farmer must inform the competent institutions; i.e., sanitary services of the ministry of agriculture or other organisations in charge of the sanitary surveillance of crops. A compulsory control perimeter can then be defined, with a compulsory insecticide control. The main vector of this disease is the leafhopper Scaphoideus titanus Ball, which transmits the phytoplasma "Candidatus Phytoplasma vitis" during phloem feeding (Lefol et al., 1993). The symptoms expressed by the affected vine will appear in summer, usually one year after contamination, and will be present on three organs: the leaves, shoots and bunches (Caudwell, 1964). The leaves undergo a red discolouration for the red varieties and a yellow discolouration for the white varieties, as well as a possible rolling. Symptomatic shoots are characterised by the absence of lignification; i.e., they do not undergo the browning process that makes them resistant to frost ( Figure 1). Finally, at bunch level, the berries will wilt and the inflorescence will dry out. The consequences of FD can be very significant, such as yield losses or plant dieback with important economic consequences for the winegrower: "infected plants showed a drastic reduction in the yield, corresponding to a decrease of between 51 % and 92 % compared to healthy plants." (Oliveira et al., 2020), "in 2005, 34 million Euro was given to Italian vine growers to compensate losses due to the disease." (Chuche and Thiéry, 2014). Without control measures, the disease can spread rapidly and affect the entire vineyard in a few years, depending on the grape variety and populations of leafhoppers present in the vineyard. A plant affected by FD is impossible to save due to its inability to directly attack the phytoplasma.
Currently the only way to effectively fight FD is to survey each vineyard block several times a year in order to detect infected vines as soon as possible. Experts of this disease are thus commissioned to scout the vineyards in order to identify the vine carrying the phytoplasma. Expertise is indeed necessary to diagnose FD, because there are many phytosanitary diseases that express visual symptoms very similar to those of FD ( Figure 1). Moreover, it is impossible to visually differentiate between a plant affected by FD and a plant affected by bois noir disease (Tessitori et al., 2018); a laboratory analysis of a sample is therefore necessary to obtain the right diagnosis (Mirchenari et al., 2015). As we could not afford a laboratory analysis for each vine showing symptoms of FD or bois noir, in the rest of the article we will not differentiate between them and we will only refer to FD. Unfortunately, the scouting experts are not able to carry out their task every year in every vineyard and need decision support tools when planning their survey. Scouters need to choose the vineyard blocks in which they will pass according to their history, which leaves time for the FD to develop in the other blocks. A scouting aid seems to be essential to increase the efficiency of this practice, and recent technological advances (better acquisition devices, more powerful computers, artificial intelligence and drones) have allowed scientific research to develop further in this direction.
The popularisation of the use of drones or UAVs has provided an extremely practical means of acquiring images of blocks due to their speed of execution. Some studies have shown that image acquisitions made from drones allow biomass, canopy temperatures, size and nitrogen consumption of crops to be estimated correctly (Holman et al., 2016;Ludovisi et al., 2017;Madec et al., 2017). Based on these successes, the diagnosis of crop diseases by drone has also been tried, in  particular to detect vine diseases. For example, some studies (Kerkech et al., 2020a;Kerkech et al., 2020b) used a segmentation approach to assign one of four classes to each pixel: the pixel studied was either a soil pixel, a shadow pixel, a healthy vine or a sick and unhealthy vine. The results of these studies are good, with the first study having an accuracy rate of 0.92 for the diseased vine class. It is also possible to calculate vegetation indices from UAV images, such as Excess Green (ExG) and Green-Red Vegetation Index (GRVI), to detect diseased vineyard areas (Kerkech et al., 2018). The patch classification (group of pixels 16x16, 32x32 or 64x64 pixels) even reached 0.95 by the end of this study.
The results of these three studies are extremely positive, which may be explained by the fact that only the discrimination between healthy and diseased vines is made and not between diseases, nor between diseases and other abiotic stresses. In Albetis et al. (2017), univariate and multivariate classification approaches were used to classify vines between vines affected by FD and healthy vines, using 20 variables computed from UAV images (spectral bands, vegetation indices and biophysical parameters). The results were good for the red varieties (severely infected by the FD); however, they were not convincing for the white varieties. In another study, Albetis et al. (2018) tried to differentiate between FD and trunk diseases (black dead arm and Esca) using 24 variables (five spectral bands, 15 vegetation indices and four biophysical parameters). Seven vineyards covering five different red grape varieties were photographed by drone. Promising results were obtained for the discrimination between vines affected by FD and healthy vines, but the results for the discrimination between FD and trunk diseases were not convincing. To our knowledge, no scientific studies have obtained convincing results (good results for red and white grapes with a large number of data and confounding diseases) when diagnosing vine diseases by drone imaging. The explanation for this may be that during the acquisition of images by drone, the bunches and shoots cannot be seen. Only the upper leaves are available for making the distinction between diseases. Moreover, the resolution of drone images (1-pixel equivalent to several centimetres) does not allow certain symptoms to be detected, such as small spots on leaves. This acquisition method seems to offer real advantages during data collection; however, these data seem very limited for tasks such as distinguishing between diseases, which requires much more detail.
A possible way of obtaining these details is by proximal sensing using either standard or multispectral cameras. The camera is used to photograph the leaves close-up, either in the field with the foliage in the background, or in the laboratory with the leaf placed on a plain background. In this way the symptoms on the leaves can be detected much more precisely. Many studies use this type of image to detect diseases, as well as to differentiate between them. In a study carried out by Al Saddik (2019), a spectral and textural analysis allowed a healthy leaf to be differentiated from a diseased leaf with more than 0.85 accuracy, the degree of infection with more than 0.74 accuracy, and the distinction of FD from bois noir and Esca with more than 0.75 accuracy. A classifier can also be used as shown by Pantazi et al. (2016), with upstream colour space changes, texture operator applications and parameter extractions. This method achieved over 0.93 accuracy in classifying leaves with symptoms of three diseases: powdery mildew, downy mildew and black rot. However, the most widely used and best performing approach for classifying symptomatic leaves photographed in close-up is the use of deep learning and more particularly CNNs (Convolutional Neural Networks). In work done by Ji et al. (2020), more than 0.99 accuracy was achieved when classifying leaf into four classes: healthy leaves, black rot, Esca and isariopsis leaf spots. The use of CNNs also obtained 0.97 accuracy when classifying vine leaves into six classes: leaves showing symptoms of anthracnose, brown spot, moths, black rot, downy mildew and leaf blight (Liu et al., 2020). A comparison of two approaches, SIFT (Scale-Invariant Feature Transform) and transfer learning, for the classification of Esca symptomatic leaves was performed (Rancon, 2019), giving very good results (about 99 % of good predictions for advanced stage Esca leaves with both methods).
There are few studies using images of vine photographed from 50 to 200 cm to diagnose grapevine diseases. One study evaluates the effectiveness of a vehicle-mounted device to characterise vine foliage, but only vegetation indices were calculated (Bourgeon, 2015). In Abdelghafour et al. (2019), a computer vision approach using joint colour and texture analysis with extended structure tensors was applied in order to differentiate vine organs on images acquired in the field. Then, in a second step, an evaluation was undertaken of the potential of high-resolution embedded imagery for epidemiological monitoring with, as a case study, downy mildew (Abdelghafour et al., 2020). The results obtained for this second step were promising and show that it is possible to estimate the sanitary state at the block level, without the need for high-precision information for each vine. Detection of Esca symptomatic leaves using a detection network (RetinaNet), as well as comparison of RGB and hyperspectral images to detect early stages of Esca, were tried in Rancon (2019), without giving better results with multispectral imaging.
Finally, a recent study (Boulent, 2020) obtained a true positive rate of 0.98 when classifying images of grapevines affected by FD using deep learning methods such as CNNs and FCNs (Fully Convolutional Network). Images of healthy and diseased grapevines were acquired by a camera at a distance of about 100 cm. This result demonstrates the ability of neural networks to detect grapevine diseases other than by taking close-ups of leaves, but it must be nuanced. Indeed, while 0.98 true positive rate is reached on the Chardonnay grape variety, it decreased to 0.08 for the Ugni blanc grape variety. This suggest that the strong differences in the expression of symptoms between two grape varieties is an essential point in the detection of vine diseases. Moreover, only symptoms on leaves were used in this study to deliver the diagnosis. Few, if any, studies address the issue of confusing the symptoms of the disease of interest with the manifestation of other diseases or abiotic stress. A literature review (GDON du Sauternais et des Graves, 2014) and thorough discussions with professional FD experts revealed that some diseases or phytosanitary problems cause exactly the same visual symptoms on leaves and that the only way to reach a reliable diagnosis is to combine the various visual symptoms; i.e., discoloured leaves, unlignified shoots and desiccated bunches. The aim of this paper is therefore to propose an artificial intelligence method focusing on all three different symptoms, simultaneously on the same vine, for diagnosing the disease.
In this study, we propose multiple contributions: • A complete experimental protocol including: image acquisitions with establishment of ground truths by prospectors, expert annotations of symptoms and confounding factors on the images, an evaluation of the symptom detection methods and the symptom association method.
• A choice of algorithms for the detection of different unit symptoms at the organ level (leaf, shoot and bunch).
• A machine-learning method for the association of these symptoms in order to make a decision at the plant level.
We will propose and compare different approaches (computer vision algorithm and artificial intelligence) for the detection of isolated symptoms of FD. Finally, an approach combining the isolated symptoms for diagnosis support at the image level will be studied.

Data collection protocol and ground truths
With the help of our partners (Groupement de Défense contre les Organismes Nuisibles (GDON) 1 from Bordeaux and Bureau National Interprofessionnel du Cognac (BNIC) 2 ), who are experts in FD diagnosis, we set up an image acquisition protocol ensuring a reliable labelling of our data.
Our acquisition device was composed of an RGB camera and an industrial flash, ensuring constant luminosity throughout the day, whatever the climatic conditions during the acquisition. A more detailed description of the acquisition device is available in the paper of Abdelghafour et al. (2020).
The blocks on which we went to collect data were identified beforehand by our partners as containing numerous cases of FD. Once on site, the scouts indicated the vines of interest to photograph (vines affected by FD, Esca, mildew, deficiencies, other diseases or phytosanitary problems causing symptoms easily confused with FD). During the acquisition of the image, an annotation file was completed indicating the disease identified at the vine level and also other symptoms present on the vine (such as non-lignified shoots, desiccated bunches, burnt leaves and nutrient deficiencies).

Image annotation protocol
Once the images were acquired, the same experts were asked to annotate, on the computer screen, the isolated symptoms on leaves and bunches, by using bounding boxes (see Figure 2). Such annotations have the advantage of being fast and easy to perform by FD experts. The leaves were separated into three classes during annotation: 'FD symptomatic leaf', 'Esca leaf' and 'Confounding leaf'; the latter class included all leaves visually different from a healthy leaf. Placing them in a separate class aimed to help the algorithms differentiate between them. The shape of the shoots did not facilitate their annotation by bounding boxes, so it was decided to annotate them with a broken line along the symptomatic shoot. The Labelme software (Wada, 2021), which annotates both bounding box and broken line, was chosen to perform this work. "Leaf-by-leaf" annotation proved very time-consuming: up to 20 minutes is needed for an image loaded with symptoms. However, tests have confirmed that such annotation is necessary. The class 'Confounding leaf' was also important. Removing this class significantly reduced the accuracy of prediction results.

Algorithms
An artificial neural network, first developed by McCulloch and Pitts (1943) and popularised by Rumelhart (1986) is a succession of algorithms that attempts to learn hidden relationships in a dataset by a method that mimics the human brain. First, a set of digital "neurons" are created and connected together, so they can send messages to each other. Next, the network is asked to solve a problem, which it attempts to do repeatedly, each time strengthening the connections that result in success and decreasing those that lead to failure.
A major advance in the construction of models for image processing came with the discovery that a CNN introduced by Fukushima (1980) could be used to progressively extract ever-higher level representations of image content. Instead of pre-processing the data to obtain derived features, such as texture and shape, a convolutional neural network uses only the raw pixel data as input, "learns" how to extract these features and, finally, deduces the object they constitute.
There are three main categories of artificial intelligence algorithms applied to images: • Classification networks -Categorises the entire image into a group; for example, "healthy vine" or "vine affected by FD". More information and applications are available in Rawat and Wang (2017).
• Object detection networks -Detects objects within an image and draws a rectangle around them; for example, a leaf affected by FD. Zhao et al. (2019) produced a review of possible methods and applications.
• Segmentation networks -Assigns a class to each pixel in the image to identify the structure and/or objects in the image; for example, a two-class classification "symptomatic shoot" and "rest" can be used to identify the symptomatic shoots in images. Ajmal et al. (2018) reviewed 13 methods which use CNN for image segmentation.

Detection of unitary symptoms
Three types of algorithms were tested for the detection of isolated FD symptoms. First, a segmentation algorithm called ResUnet (Diakogiannis et al., 2020) was tested for the segmentation of unlignified shoots. This algorithm was then extended to the segmentation of symptomatic and healthy bunches. A Resnet34 was used as a parameter extractor, and different levels of depth of the Unet algorithm (1, 2, 3, 4, 5), input image sizes (64x64, 128x128, 256x256 and 512x512 pixels) and resolution degradation (division by 4 and 8 of the number of pixels), convolution kernels (3,5,7), data augmentation (performs one or more of these operations randomly: image rotation, pixel dropout, contrast and brightness change), loss function (categorical cross-entropy, dice loss, Tversky loss, Tversky focal loss) were tested. The ResUnet algorithm is known to give good results for object segmentation, but it is time-consuming to compute. To reduce this prediction time, as well as to obtain more training images, an image (2048 x 2448 pixels) was divided into many thumbnails. The best results are obtained by dividing an image into 20 thumbnails of 512 x 512 pixels, then one row and one column out of two were removed from each thumbnail, thus transforming an image of 2048 x 2448 pixels into 20 thumbnails of 256 x 256 pixels.
Once the pixels of the thumbnails had been predicted, a bilinear interpolation allowed the thumbnails to recover their original sizes and the 2048x2448 image was then reconstituted. The Tversky loss was chosen as the loss function for our network. This function is used to penalise false positives and false negatives, and obtain sound results in the case of unbalanced class distribution, which is our case. The Tversky loss is calculated as follows: with TP, FN and FP the true positive, false negative and false positive pixels respectively of the symptomatic shoots.
The results of the segmentation of the symptomatic shoots were then compared to those obtained via an algorithm using the structure tensor, which has the advantage of having a fast execution time.
The calculation of the structure tensor and its eigenvalues provided an index of the degree of anisotropy of the gradient at each pixel of the image (Budde and Frank, 2012). This figure, between 0 and 1, is very close to 1 when the gradient has a preferred direction. This algorithm calculated the local anisotropy of the gradient in each image and its use was based on the fact that the shoots were more anisotropic than the leaves or bunches, because they followed a specific = 1 − + . + .
= + = + direction. A thresholding on the green channel allowed only the symptomatic shoots to be recovered. Hysteresis thresholding operations (Pridmore, 2001) were then applied to the results of the structure tensor algorithm to refine the results, as well as morphological operations (does the shape resemble a shoot; i.e. a continuous elongated shape or not?) to remove any false positives.
To study the symptomatic shoot predictions of the algorithms, two metrics were used: the pixel metric and the object metric. Using these two metrics it was possible to compare the two algorithms to predict the unlignified shoots, which are the structure tensor algorithm and the ResUnet segmentation algorithm. For both metrics, precision and recall were calculated. Precision corresponds to the percentage of correct prediction on the whole of positive predictions. Recall corresponds to the percentage of the predicted positive element, which is among those that the algorithm should have predicted as positive. The calculations used for these two indices are as follows: A YOLOv4-tiny algorithm, which is the fastest version of YOLOv4 (Bochkovskiy et al., 2020) and the fourth version of the YOLO algorithm (Redmon et al., 2016), was used to predict symptomatic leaves and bunches. YOLO is a deep-learning algorithm known to be fast and accurate for this type of task. The speed of this algorithm comes from the fact that the image is "looked at" only once (YOLO: You Only Look Once) to both predict the bounding boxes and their associated class.
The algorithms used for the detection of leaves, shoots and bunches are summarised in Table 1.

Combination of symptoms for disease diagnosis
The association of the unit symptoms predicted by these algorithms was performed by a RandomForest classifier (Ho, 1995). The RandomForest algorithm was chosen for its speed of execution and the simplicity with which it was possible to visualise the importance of each parameter in the final decision. An image was then represented by a vector of 16 parameters resulting from the first step of symptom detection and constituted the input of the classifier: • Height parameters were calculated from the leaf predictions. The first three were the number of leaves belonging to the class 'FD symptomatic leaf', 'Esca leaf' and 'Confounding leaf'. The detection algorithm also assigns to each prediction a confidence score of between 0 and 100. The average of the confidence scores for each class was thus calculated, attesting to the degree of certainty of the network in its detections. The number of leaves of the same class that were "spatially close" was evaluated. These measures seemed important, because when a plant is affected by FD or Esca, the symptomatic leaves are located along the symptomatic shoot and are thus spatially close.
• Four parameters were calculated from the shoot and bunch predictions: number of symptomatic shoots, number of symptomatic bunches, number of healthy bunches and maximum thickness of the detected symptomatic shoots. The latter parameter was able to distinguish between a shoot really symptomatic of FD on the one hand, and the petioles or the ends of the shoots in the process of lignification on the other.
• By associating the two algorithms of detection of isolated symptoms, four parameters were defined: o minimum distances between a symptomatic shoot and i) a leaf symptomatic of FD, and ii) a leaf symptomatic of Esca.
o minimum distances between a symptomatic shoot and iii) a symptomatic bunch, and iv) a healthy bunch.

Implementation
All the algorithms in this study were developed in Python language, version 3.8. The artificial intelligence algorithms used the Tensorflow 2.5.0 library.
Only algorithms that could be embedded on low-cost hardware, such as Nvidia Jetson Xavier (512-core NVIDIA Volta GPU, 64-bit-8-core NVIDIA Carmel CPU, 16GB RAM, 32GB storage) were considered in this study.

Collected datasets
The image acquisition campaign started in August 2020. For this first year of data acquisition, it was agreed to focus our study on two grape varieties: i) Cabernet-Sauvignon, the most common red grape variety in the Bordeaux region, Sixty-seven images of Cabernet-Sauvignon and 215 images of Ugni blanc were annotated by experts and form the two data sets for the detection algorithm, which are called "CS20EXP" and "UB20EXP" respectively.
An image set of the segmentation algorithm for symptomatic shoots and bunches, as well as for healthy bunches, was created for this study. This set contained 132 images from previous campaigns (four images) and the 2020 acquisition campaigns (78 images of Cabernet-Sauvignon and 50 images of Ugni blanc) and associated masks. There were 18 (4 of Cabernet-Sauvignon and 14 of Ugni blanc) images in common between the detection and segmentation algorithm sets. The masks were created using the GIMP software by cropping the objects of interest and by fixing the pixels of objects of the same class to the same values ( Figure 1SD).
All acquisitions and annotations made in the years 2020 and 2021 are summarised in Table 2.
Four different datasets were studied in the final image-scale diagnosis by the RandomForest algorithm. The first two, called "WGV20" (White Grape Variety 2020) and "RGV20" (Red Grape Variety 2020), contained the images acquired in 2020 not annotated for the detection and segmentation networks.   Table 1SD. The different image acquisition conditions between blocks, as well as the differences in FD symptom expression between grape varieties, are shown in Figure 3.
We identified some images in the "confounding" class as displaying symptoms extremely similar to those of FD, being part of the 'Confounding +' subclass. These symptoms may be due to the buffalo treehopper for red grape varieties, to undiagnosed phytosanitary problems called "all red" for red grape varieties and "all yellow" for white grape varieties by FD experts, or to shoot or petiole breakage, as shown in Figure 1. The WGV20, RGV20, WGV21 and RGV21 datasets contained 10, 56, 8 and 30 such images respectively. Table 3 summarises the distribution of images for these four datasets. The two latter datasets were used to test the robustness of the method when applied to different grape varieties, and thus to the different symptom expressions. Table 4 compares the best average precision and recall obtained by the structure tensor with hysteresis thresholding and morphological operations and ResUnet on the same test set (20 images). The images were separated in the following way: 75 % in training set, 10 % in validation and 15 % in test, while making sure that the proportions of pixels of each class were similar in each set.   The calculation time of the structure tensor was high due to the hysteresis thresholding and the morphological operations, which required each detection to be reviewed. The results of the structure tensor alone on the same test set were 0.19 for precision and 0.24 for recall for the pixel metric, those of the structure tensor with hysteresis threshold were 0.17 for precision and 0.44 for recall. The optimisation of the computation time of this algorithm was not tried, given that ResUnet gave better results and was able to detect the addition of new object classes. The ResUnet algorithm appeared to be the most suitable for the detection of symptomatic shoots, in terms of both results and computation time. Examples of the predictions of the ResUnet algorithm are shown in Figure 2SD.

Symptomatic and healthy bunches
The YOLOv4-tiny detection algorithm was used for the initial attempts at detecting symptomatic bunches, which were annotated at the same time as the leaves. Unfortunately, their very low number (126 symptomatic bunches annotated for the red grape variety, 55 for the white grape variety) did not yield correct results.
An attempt was made to segment them using the ResUnet algorithm, giving better results when the classes were unbalanced. Moreover, the addition of a class did not change the prediction time of an image, nor the accuracy of the existing classes when the classes were very different. The masks of the image set for the segmentation of the shoots were then modified to also contain the symptomatic bunches. By keeping the same distribution of the training and test sets and by keeping the same parameters of the shoot segmentation network, the results presented in Table 4 for the segmentation of symptomatic shoots and bunches were obtained.
We can see in Table 4 that the precision was high, but the recall not so good. This can be explained by the fact that i) the symptomatic bunches that had lost all their berries were very complicated to predict because of their very small size (a few pixels of thickness), and ii) the division of the resolution by 2 of the original images (to reduce the prediction time) reduced the size of the symptomatic bunches even further.
The low presence of symptomatic bunches in our images, often very small and possibly already fallen or hidden by leaves, led us to believe that their detection would not play a big role in the final diagnosis. On the other hand, the presence or not of healthy bunches seemed to be more of a determining factor for the presence or absence of the disease. The 'healthy bunch' class was therefore added to the segmentation masks. By keeping the same parameters for the training of the network as for the symptomatic shoots and bunches, the addition of this new class allowed the last row of Table 4 to be added.

Symptomatic leaves
The YOLOv4-tiny detection network was tested for the detection of FD symptomatic leaves, Esca leaves and confounding leaves. In the same way as for the ResUnet segmentation network, the images were cut into 608 x 608 pixels thumbnails, a resolution known to give good results and with which it was possible to increase the number of training images. The number of images available for this study to properly train this type of network was very low compared to that recommened by the literature (Alexey, 2021). A data-augmentation applied randomly one or more of these operations on the images: horizontal flip, image resize, rotation, crop, horizontal and/or vertical translation. This step artificially increases our dataset. Twelve images were thus artificially created from a single image for the red grape set and four images for the white grape set. Moreover, when cutting the training images, an overlap was made so that a new thumbnail was created at each intersection of two thumbnails. A box that would have been cut into two during the division into thumbnails was thus found to be intact in this new overlapping thumbnail. It was decided to have a proportion of 75 % of the images in training, 15 % in testing and 10 % in validation. The best results for each dataset on their respective testing set are shown in Table 5. As can be seen, these results were not very high, due to the very limited number of images at our disposal for training the network, but they constituted a basis on which to build the diagnostic method at the image scale. Examples of good and bad predictions of the algorithm are presented in Figure 3SD and Figure 4SD. While the computation time may seem high for a real-time prediction, the tests performed by dividing by  An object was classified as a true positive when at least 20 % of its pixels had been correctly classified. The parameters of the structure tensor were the following: derivative filter: sigma = 2. Gaussian smoothing: sigma = 8. Threshold by hysteresis: low threshold = 0.5, high threshold = 0.9. Morphological operations: only the detections verifying: length of the major axis > 90 pixels, 6 pixels < length of the minor axis < 17 pixels were retained. The ResUnet algorithm was trained with 256 x 256 pixels images, four levels of depth, α = β = 0.8 for the Tversky loss function and batches of 30 images during 300 epochs. The prediction time was that of a 2048 x 2448 pixels image on the Nvidia Jetson Xavier card.
four the number of pixels of the image (deletion of one row and one column out of two) gave prediction results almost identical to those shown in Table 5. This method results in a considerable reduction in the prediction time of an image.

Diagnosis at the image scale
To study the results from the first two datasets (WGV20 and RGV20) a cross-validation was performed: each dataset was divided into four folders, the training was performed on the images of the first three folders and the test on the last folder.
The results were saved, then the layout of the training and test folders was changed, the algorithm re-trained and the results saved. This meant that each folder was the test folder and the training was performed on the other three folders. The mean of the accuracies and of the recalls for image classification was computed (Table 6). Then to evaluate the robustness of our method, symptom detection algorithms and classifier trained on the 2020 images were used to predict symptoms and classify the 2021 images. (WGV21 and RGV21).
The results for the both the 2020 and the 2021 white grapevine datasets were excellent. The algorithm managed to correctly classify most of the images despite the uncertain leaf predictions by the YOLOv4-tiny algorithm.
For the red grape variety datasets, the results were a little disappointing. This is due to the higher proportion of images showing symptoms highly confusing with those of FD in the red varietal sets (18 % of images for RGV20 and 10 % for RGV21) than in the white varieties sets (5 % for WGV20 and 3 % for WGV21). These percentages do not seem to correspond to the percentages that can be found at the plot level. Tests were performed by removing these images from RGV20 and the precision and recall for the classification of FD images increased to 0.95 and 0.92. We can see how the images belonging to the 'Confounding+' class influence the importance of the RF parameters in the final diagnosis ( Figure 4). Without the 'Confounding+' images, by far the most important parameters were the number and location of detected FD and Esca symptomatic leaves. The algorithm will only obtain a very good classification score with these four parameters. When adding the 'Confounding+' images, we realised that the parameters did not have the same importance. Here the algorithm needed additional  The parameters of RandomForest were: number of trees: 500, maximum size of the input subset of a tree: 70 % of the dataset. All other settings were the default settings. There were no Esca precision and recall for the white grapevine 2020 set, because the number of images classified as Esca was too low to obtain significant results. They were thus annotated as part of the confounding class.
information to make the right diagnosis and it took into account all the other parameters to successfully differentiate 'Confounding+' images from 'FD' images.
For the WGV21 dataset, the results were less impressive than for WGV20. While the images of Ugni blanc 2021 were very well predicted, those of Sauvignon blanc were less so. This was due to the fact that the leaf detection algorithm has never been trained on Sauvignon Blanc images. The results were nevertheless very promising and they indicate that by adding some Sauvignon blanc images to the YOLOv4-tiny training set, the results will be similar to those obtained for Ugni blanc.
For the RGV21, the model was confronted with a difficulty: the images of Cabernet-Sauvignon 2021 were acquired from further away (see Figure 3); the leaves being diagnosed were thus much smaller than those on which the YOLOv4tiny was trained. The detection results of FD symptomatic leaves were therefore very poor for this set of images. The classifier therefore fails to perform well, with the parameters from the FD symptomatic leaf detections having the most weight in its final decision-making ( Figure 4). The results for Cabernet franc were very good, even though none of the algorithms were trained on this variety. Finally, the results for the Merlot images were acceptable, despite the fact that the hue of FD symptomatic leaves was very different from those of Cabernet-Sauvignon symptomatic leaves and that the Merlot grape variety weakly expresses symptoms of FD.
Without 'Confounding+' images, the diagnosis at the image scale was only easy with the detection results of FD and Esca symptomatic leaves. With the addition of 'Confounding+' images, these parameters were no longer sufficient and the importance scores for shoot and bunch features were higher.

Response to the issue
In this study, we have shown that despite a reduced dataset and disappointing results in the detection of FD symptomatic leaves, it is possible by association with other symptoms to achieve very accurate results in the classification of an image into three groups: 'FD', 'Esca' and 'Confounding'. Moreover, our method seems to be robust when encountering different expression of FD symptoms with different grape varieties, even when the unitary symptom detection algorithms, as well as the final classification algorithm, had not been trained for these varieties. The importance scores of the RandomForest parameters highlight the relevance of detecting other symptoms of FD for its diagnosis. Indeed, without highly confounding FD symptoms, information about leaves alone provides correct classification results. On the other hand, as soon as images displaying symptoms that are easily confused with those of FD are encountered, the information on the leaves is no longer sufficient and the algorithm uses the information of the other symptoms to differentiate them. Finally, the computation times of the algorithms used in this study seem to be in line with the objective of future real-time image processing and diagnosis of FD.

Comparison with other studies
Our results can be compared with those obtained in (Boulent, 2020), with a true positive rate of 0.98 in the prediction of images of grapevines Chardonnay grape variety affected by FD acquired in proxy-detection. Our results are slightly worse in training and testing on the same grape variety, but our image-level diagnostic datasets contain many images of vines with symptoms highly confounded by FD, which is not necessarily representative of the distribution of symptoms at the block level. Our datasets also only contain images of plants with visual symptoms, so including images of completely healthy vines would certainly increase our results.
This study demonstrates the robustness of this new methodology when used on different grape varieties and thus for different expressions of disease symptoms. This contrasts with the results of the study carried out by Boulent (2020), in which the rate of images correctly classified as showing FD symptoms was about 0.98 for the Chardonnay grape varietya grape variety that strongly expresses the symptoms of FD; however, it fell to 0.8 for Ugni blanc grapes.

Limitations of the study
Few annotated images were available during this study, especially for the training set of the leaf detection algorithm, which was also only trained on two different grape varieties.
Even if the final test was performed on five grape varieties and proves the robustness of our method when used on different grape varieties, the results would certainly be better if the training set of the YOLOv4-tiny algorithm was more diverse.
We processed only one image at a time, without considering symptoms on adjacent vines or the other side of a symptomatic vine. No measurement of the results at the block level was possible, because our acquisitions in the year 2020 were only for vines suffering from phytosanitary problems.

Future work
The images acquired in 2021 will be annotated and will allow us to enrich our datasets for the symptom detection algorithm, providing various expressions of symptoms, distance and acquisition conditions.
First diagnostic results at the block scale will be possible thanks to three acquisitions carried out in 2021: each vine of the block was photographed by fixing our acquisition device on a grape harvester or a quad bike, the block having been previously scouted and the symptomatic vines of the FD photographed and geolocated. These data will also allow us to consider the spatial distribution of the detected symptoms, as well as to study both sides of the same vine.

CONCLUSIONS
The methodology developed in this paper shows promise for the automatic diagnosis of FD. The idea of not only basing the diagnosis on the symptoms expressed on the leaves, which is usually found in the literature, but on a combination of all the symptoms seems to be more robust.
The first step of symptom detection is performed via two deep-learning algorithms: the first one, a segmentation algorithm called ResUnet, is used to automatically detect symptomatic shoots (obtaining 0.82 in precision and 0.59 in recall for the object metric), symptomatic bunches (0.8 in precision and 0.38 in recall), as well as healthy bunches (0.86 in precision and 0.7 in recall); the second, a detection algorithm (YOLOv4-tiny), obtains a precision of 0.58 and a recall of 0.5 in the detection of FD symptomatic leaves of the Cabernet-Sauvignon grape variety, and a precision of 0.67 and a recall of 0.49 for the Ugni blanc grape variety.
The second step of associating the detected symptoms to produce a diagnosis at the image scale is performed by a RandomForest classifier. For each image, 16 parameters from the symptom detection results are computed and used as input for the classifier. Precision and recall of 0.92 and 0.89 (0.76 and 0.75 respectively) are obtained for the Ugni blanc (resp. Cabernet-Sauvignon) image set acquired in 2020. The study of the CS20 set with/without the images showing symptoms that are easily confused with those of FD ('Confounding+' subclass) shows that without these images, the detection of FD symptomatic leaves alone can provide very good results for disease diagnosis. However, with 'Confounding+' images, the classifier uses information from other symptoms to differentiate them from images of plants affected by FD.
Finally, the training of RandomForest on the images acquired in 2020 and the tests on two datasets of images acquired in 2021, which contain different grapevine images as well as different acquisition conditions, highlight the robustness of our method in light of these differences by obtaining 0.88 in precision and 0.75 in recall (resp. 0.92 and 0.42) for the WGV21 dataset (reps. RGV21).