Big climate data assessment of viticultural conditions for wine quality determination in France

Grapevine is one of the economically most important crops especially in Europe. Although its investment value has been widely recognized, the complex interactions between climate and viticulture remain immaturely understood and modeled, which largely limits a reliable investing strategy by using the observed climate conditions to estimate the wine quality. Therefore, with the aim of comprehensively analyzing the climate-viticulture relationship, compared to most previous studies which employed a few climate factors derived from sparsely located meteorological stations, in the present study, we include 22 climate factors, including temperature, water balance, atmosphere, and radiation data provided by a global land assimilation system covering a period of 40 years (1970 to 2010) as well as two large-scale atmospheric teleconnection indices to establish a holistic climate-wine quality model. Moreover, instead of the conventionally used simple regression methods, to deal with the comprehensive but volume climate dataset, we employ the Least Absolute Shrinkage and Selection Operator (LASSO) regression method, which excels in ingesting a massive amount of variables having complex collinearities. In the pre-analysis of correlations between utilized climate factors, it is found that sunlight has the strongest connections with other factors as it correlates with the most number of climate factors. On the contrary, temperature, the conventionally most commonly employed factor, correlates with much fewer factors. Finally, via validation with wine vintage scores derived from two authoritative rating systems, it is ensured that our proposed approach can accurately establish the climate-wine quality models for four well-known wine-growing regions in France, including Alsace, Bordeaux, Burgundy, and Champagne. Due to the more complex climate pattern of Bordeaux compared to other regions, two bank-wise models instead of a bank-merged model is vital for Bordeaux to achieve a similar modeling accuracy. Eventually, a satisfactory vintage deviance explaining accuracy with one standard deviation score residual within ± 6 points can be achieved in all regions. Therefore, based on the established climate-wine quality model together with the observed climate conditions, the wine quality of each region can be reliably predicted, which provides a reliable reference for wine investment.


INTRODUCTION
Cultivated grapevine, Vitis Vinifera L., is one of the economically most critical agricultural crops, especially in Europe (OIV, 2012). As great wine can be stored for many years adding value each year, it has become the new investment target for portfolio diversification and also guarantees high auction prices and the existence of secondary markets (Robinson and Harding, 2015). Many studies also revealed that the potential return benefit of wine is even comparable to equities and bonds (Masset and Weisskopf, 2013;Dimson et al., 2015;Masset et al., 2018). Moreover, compared to other financial derivatives such as bonds and stocks which values are greatly affected by interest rates and contracts, wine pricing is significantly determined by its natural value, i.e., wine quality (Arias-Bolzmann et al., 2003;Ashton, 2016). Therefore, to accurately estimate wine quality for investment, a comprehensive understanding of factors influencing the quality of wine is paramount.
The conditions of climate such as precipitation, temperature, and wind have a direct and huge impact on the grapevine phenology and pose fundamental considerations, especially for European winemakers (Duchêne and Schneider, 2005;Ashenfelter and Storchmann, 2016). The uniqueness and wine typicity characteristic are different from place to place, indicating that not only the climate but also additional local factors affect wine quality (Vaudour, 2002). Terroir, as it is used nowadays, is defined as an identifiable physical and biological environment together with an applied vitivinicultural development. Its effective coverage generally spans hundreds of hectares but may vary considerably (Morlat and Bodin, 2006).
However, due to limited data accessibility, previous studies generally only utilized a single climate factor derived from sparsely distributed meteorological stations to analyze the influence of climate on the wine quality. For instance, most studies relied on temperature records and a temperature-based index, such as the Growing Season Average Temperature (GST), Growing Degree-Days (GDD), Cool Night Index (CI), Huglin Heliothermal Index (HI), Biologically Effective Degree-Days (BEDD), Chilling Portions (CP), and Growing Degree Hours (GDH) (Menzel, 2005;Caffarra and Eccel, 2011;Anderson et al., 2012;Helder Fraga et al., 2019). The second commonly employed factor is precipitation (Lorenzo et al., 2013;H Fraga et al., 2014;Koufos et al., 2014;Oczkowski, 2016). Other climate factors, such as the duration of sunlight (Ferrise et al., 2016), amount of evaporation (Laget et al., 2008), air humidity, and wind speed (Carey et al., 2008) were considered to a limited degree only. Moreover, these studies usually employed records from commonly less than 10 -in some cases even no more than one -local weather stations (Tomasi et al., 2011;Lorenzo et al., 2013;Koufos et al., 2014;Oczkowski, 2016). With such a sparse spatial distribution of observation points, it remains difficult to develop a continuous representation of regional climate conditions of wine-growing areas (Anderson et al., 2012), and the reliability of the analysis may, therefore, be challenged.
Furthermore, in order to quantitatively and systematically assess climate effects on wine quality, most previous studies modeled the climate-viticulture relationship through a simple regression method (Gladstones, 2011;H Fraga et al., 2014;van Leeuwen and Darriet, 2016). Although traditional regression approaches have been widely applied, we consider that identification of appropriate climate factors and understanding of direct connections between variables climate and wine quality through conventional regression methods remain unclear for the following reasons: (1) The role of climate factors is altered with the physiology and phenology of grapevine. The phenological stages of grapevine are generally divided into budburst, flowering, fruit set, berry maturation and color changing, and eventually grape maturity and harvest. The time between each stage largely differs with grapevine varieties, climate conditions, and geographic locations (Jones and Davis, 2000). Also, climate factors act and pose different magnitudes of impacts over each period. Climate processes overlap and are difficult to be separated. Moreover, the same climate component can lead to opposite physiological effects. For example, wind circulation releases excessive heat and eases disease infection (Skelton, 2014) but can also increase the risk of flowering disturbance (Jones and Davis, 2000); abundant amounts of water help deposit grape skin tannin and anthocyanin, but may also cause reduced photosynthesis (Duteau et al., 1981). On the other hand, mild water stress causes better fruit production (Kennedy et al., 2002); moderate sunlight increases grape coloring, but overdose light may cause sunburn (Winkler et al., 1974).
(2) The quality of the wine is controlled by many biochemical reactions and processes (Harbertson and Spayd, 2006), which are connected to various climate factors. Hence, this many-to-many relationship is difficult to be assessed and measured due to interdependences. For example, the amount of grape skin anthocyanin positively correlates with the intensity of sunshine but correlates negatively with air temperature (Berli et al., 2008); high temperature boosts accumulation of sugar but limits the formation of anthocyanins and malic acid (Ashenfelter and Storchmann, 2016); some enzymes react in a reverse U-shaped pattern to changes of temperature and will influence grape flavor (Ashenfelter and Storchmann, 2016); enzyme activity and stomatal conductance of photosynthesis are affected by both temperature and air pressure (Sage and Kubien, 2007); high CO 2 level increases vegetable crops starch and ascorbic acid but decreases malic acid, glycoalkaloids and volatile compounds (Lalel et al., 2003).
(3) High correlations and complex interactions/ causations between climate factors may lead to incorrect interpretations. For example, the temperature is found to be correlated with the light exposure period and may incorrectly lead to the conclusion that wine quality is driven by temperature (Bergqvist et al., 2001). Additionally, each climate factor is interlinked to each other, e.g., rising average temperature caused by climate change would also alter the patterns of precipitation and evaporation (Dessens and Bücher, 1995). Furthermore, global warming behaves asymmetrically on regional and temporal scales, and the variety of unclear dependencies and impacts increases the difficulty of assimilating local climate-change patterns (Jones, 2004). For instance, an increase of CO 2 level causes changes of temperature and precipitation distributions which affect grape yield accordingly (Bindi et al., 2001); the advancement of phenological stages enables efficient water and radiation usage and thus compensates the high temperature and irregular precipitation (Giannakopoulos et al., 2009).
Based on the reasons listed above, it is realized that assessing the relationships between climate and wine quality via employing a single climate factor and through a traditional simple regression method as most previous studies conducted is impractical and biased. Hence, in the present study, we utilized an advanced regression method, which is capable of dealing with a big volume of data and their collinearities to address the complex many-to-many climate-viticulture interactions. Moreover, we employed a wide range of climate data covering temperature, water balance, atmosphere and radiation derived from a global climate model as well as atmospheric teleconnection indices to compile a comprehensive climate dataset. By employing this big climate data as well as the sophisticated regression approach, we can understand and model the more complex relationship between climate factors and wine quality in different wine-growing regions. Eventually, based on the established climate-wine quality models together with the observed climate conditions, the wine quality of each region can be accurately predicted, which provides a robust and reliable reference for the wine investment.

MATERIALS AND METHODS
In this study, we propose a big-data approach in order to model the complex interactions between climate conditions and wine quality. Wine quality scores recorded between 1970 and 2010, as well as monthly climate data collected over the same time period, were used to enlarge the data coverage. We then utilized the Least Absolute Shrinkage and Selection Operator (LASSO) method to examine the datasets to discover the optimal correlation between climate observations and wine quality scores. The proposed method was applied to analyze four well-known wine-production regions in France as the country is ranked as the top wine-producing and consuming country worldwide and one of the most world-renowned Old World winemaking countries (OIV, 2012). The regions include Alsace, Bordeaux, Burgundy, and Champagne. Additional wine-growing regions were excluded in this study to reduce the size (e.g., Loire Valley, Rhone Valley, Languedoc) and to work with non-fragmented and spatially continuous data (e.g., Sud-Ouest). Moreover, thanks to the reputation of our four targeted regions, their historical wine quality records are well-documented, which largely facilitates the completeness of the analysis.

Wine Quality Data
Wine vintage ratings are commonly used to reflect the quality of wine that is produced each year (Jones et al., 2005;van Leeuwen and Darriet, 2016;Davis et al., 2019). Even though each rating system is graded by the different expert(s) using different value ranges (5-, 10-, 20-or 100-point scale), a general agreement between them has been proven ( Van Jones, 1998).
We initially considered nine unique authoritative vintage systems rated by individual experts and organizations: R. Parker (Wine Advocates magazine), J. Robinson, M. Broadbent, the wine magazines Wine Enthusiast, Wine Spectator, Decanter, Wine Society, as well as wine merchants Berry Bros. & Rudd and the "Société des alcools du Québec." For comparison reasons, we selected the ones with the most detailed scale and discarded 10 point scales and lower as well as purely descriptive scales. From the remaining 100-point-scales of Parker, Wine Enthusiast, and Wine Spectator, we excluded the score by Wine Enthusiast due to its much shorter publication history (since 1990), when compared to Parker (since 1970) and Wine Spectator (since 1948), and finally ended up with the scores by R. Parker (RP) and Wine Spectator (WS) for vintage data in this study.
Considering the temporal overlap between the RP and WS score history and the availability of climate data, we selected vintages from 1970 to 2010 as the studying time period. Additionally, in order to clarify the phenological stages, climateviticulture relationship, and wine quality of different grapevine varieties, for Bordeaux and Burgundy, we only selected red wine vintage ratings (for Alsace and Champagne, both RP and WS only provide a single vintage rating).
The time-series vintage rating of each region of both RP and WS is plotted in Figure 1. Firstly, it is found that in general, both ratings share a high similarity, especially in Burgundy and Bordeaux. A slight rating difference between two rating systems is observed in Alsace as most WS scores are higher especially after 2000; however, a comparable trend remains clear. For the Champagne region, the ratings are mainly derived from WS because RP has many missing values due to the reason of "not yet sufficiently tasted to rate" (Parker, 2020).
With the goal to mitigate the personal favor bias in each system and densify the usable vintage scores, we averaged two time-series ratings of each region and then used it as the wine quality index for the following analysis. Other consensus ranking approaches as discussed in Borges et al. (2012) were not utilized in the present study, due to the fact that the qualitative ordinal scale ranking information is not suitable for the following regression analysis which requires a continuous dependent.

Climate Data
In contrast to previous studies which only employed few climate factors measured from sparsely located meteorological stations, in the present study, we employed the climate data derived from Global Land Data Assimilation System Version 2 (GLDAS-2) developed jointly by the National Aeronautics and Space Administration (NASA) and National Oceanic and Atmospheric Administration (NOAA, 2020) (Rodell et al., 2004a). GLDAS-2 provides a satisfying spatial resolution data (0.25°) compared to other global climate datasets such as atmosphere-Ocean General Circulation Model (AOGCM) HadCM3 model (2.5°) and Global Historical Climatology Network (GHCN) (0.5°), and its data consistency is also guaranteed as it covers since 1948 to present. Moreover, it provides a wide range of highly reliable climate factors based on the assimilation of both satellite-and ground-based observations (Rodell et al., 2004b). Thus, the bias of each parameter can be largely mitigated. Practically, we employed the monthly Noah land surface model (LSM) product of GLDAS-2 (Rodell et al., 2004a) owing to its finer spatial resolution (0.25°) compared to other LSM-based GLDAS-2 products (1°), shorter diurnal variations, and less biases due to its steady progression of improvements. Detailed descriptions of the Noah LSM refer to (Ek et al., 2003). Its high accuracy and robustness are also affirmed by previous studies which conducted validations of different climate variables with in-situ measurements and remotely sensed datasets (Spennemann et al., 2015;Wang et al., 2016;Singh et al., 2018). We selected 22 climate variables covering temperature, water balance, atmosphere, and radiation conditions for the following climatewine quality relationship modeling, as listed in Table 1. For each study region, each climate variable is calculated by average the GLDAS-2 data within the region. Temporally, because the phenological stages of the grapevine of countries in the Northern Hemisphere generally usually start from March and end in September (H Fraga et al., 2012), we extracted climate factors in this period for the following analysis.

Teleconnection Index
In addition to climate factors, large-scale atmospheric dynamics caused by teleconnections were also included in this study. Teleconnections describe large-scale atmospheric anomalies and climatic oscillation links between large-distance circulations (Dalla Marta et al., 2010). It has been widely recognized that the air pressure, temperature, and precipitation patterns in Europe are highly associated with the North Atlantic Oscillation (NAO) (Avolio et al., 2008) and Arctic Oscillation (AO) (Kodera and Kuroda, 2004). Meteorologically, AO is a critical and dominant hemispherical oscillation (Krichak et al., 2014); however, most previous climateviticulture studies focused on the local oscillation subsets, i.e., NAO, but neglected the influence of AO (Dalla Marta et al., 2010). Although the high correlation between AO and NAO is occasionally observed in meteorological data (Thompson and Wallace, 2000), the relationship between them and their impact on climate patterns remain unclear (Krichak et al., 2014). Thus, in this study, two teleconnections indices are included. Because both NAO and AO have long-term influences on the climate condition; hence, we collected monthly AO and NAO indices covering the whole year to analyze the viticulture-atmosphere oscillation relationship. Teleconnection data were downloaded from the NOAA website (2020).

Pre-analysis of Correlations Between Climate Factors
Before building the relationship between 22 climate factors, two teleconnection indices, and wine quality, firstly, we analyzed the correlations within climate factors. It is crucial to investigate their cross-month patterns and similarities because this pre-analysis enables a thorough understanding of the relationships between different climate factors. Moreover, it provides information about which factor is the most influential, i.e., sharing the most number of strong correlations with other factors, within the employed climate dataset.
Thanks to the large amount of long-term 40 years data we employed, we were able to thoroughly examine the four wine-producing sites, respectively. The pre-analysis included: (1) internal correlations of the same factor between different months and (2) cross-correlations between different factors. It must be noted that as we included a total of 24 variables (22 climate factors and two teleconnection indices), it would be impractical to simultaneously analyze their cross-correlations. Instead, based on the previous climate-viticulture studies (Menzel, 2005;Hall and Jones, 2010;Caffarra and Eccel, 2011;Tomasi et al., 2011;Anderson et al., 2012;Lorenzo et al., 2013;H Fraga et al., 2014;Koufos et al., 2014;Oczkowski, 2016) and the prior knowledge, we selected five dominant climate factors, including sunlight, air temperature, precipitation, air pressure, and wind speed, and then investigated their correlation coefficients with the remaining climate factors.

Modeling Climate-wine Quality Relationship Using LASSO Regression
To investigate the relationship between climate factors and wine quality, simple linear regression analysis is the most commonly used method (Menzel, 2005;Hall and Jones, 2010;Koufos et al., 2014). However, when numerous climate factors covering a long temporal period are employed for analysis, simple linear regression would be an unsuitable approach because there are countless possible combinations of variables and the collinearity between variables would cause biases in the result. Consequently, generally for a large dataset, most studies applied either "all-possible combination" regression (Jones and Davis, 2000) or stepwise method (Tomasi et al., 2011;H Fraga et al., 2014); yet the former requires two steps analysis to select suitable n-variable model suites and the latter regression method has been broadly criticized that it does not actually identify all best variables combinations and suffers from highly biased mode and wrong hypothesis testing results ( Van Jones, 1998;Harrell, 2015). Therefore, an alternative advanced regression method which is capable of handling a big amount of variables sharing collinearity is desired. Hence, in the present study, the LASSO regression method (Tibshirani, 1996) was employed. Thanks to its capability of avoiding overfitting (Dahlgren, 2010) and handling collinearity between variables (Yanqi, 2016), LASSO regression can ensure the stability of model fitting and accurate variables selection when handling a big amount of data (Hesterberg et al., 2008). Consequently, it has been applied in big data analysis (Ludwig et al., 2015) and ecology studies (Holdo and Nippert, 2015), yet it has not been utilized in the climateviticulture discipline.
The LASSO regression formula can be expressed as (Tibshirani, 2011): where x ij are standardized variables and centred response values y i for i =1,2,…,N and j=1,2,…,p.
The most valuable part of LASSO regression is it minimizes the sum of squares in the form ∑|β j |, but not the form ∑|β j ² | used in the ridge regression. Namely, the LASSO method limits the sum of (1) the absolute value of all regression coefficients while solving maximum model fitness, enabling excluding redundant variables. Hence, it functions not only variable shrinkage but selection (Tibshirani, 2011).
Practically, the "glmnet" 4.0 version package (Friedman et al., 2010) applied in the R 3.6.3 version was used in the present study with all variables standardized. We processed LASSO regression based on a full climate dataset for the four wine-growing regions individually. In order to assess the influence of teleconnections, the regression was also performed in two scenarios: with and without teleconnections indices. For validation, the wine quality data of the last five years (2003-2010 for Bordeaux, Burgundy, and Alsace) were excluded from the regression. Based on the analysis results, we then compared (1) the percentage of explained deviance and (2) the number of nonzero coefficients of each region, which enables the examination of modeling accuracy.

Internal Correlations of the Same Climate Factor
To implement the analysis, the coefficient matrix was plotted using a heat map, in which the value of each tile corresponds to the Pearson correlation between its column (April to September) and row (March to August). With this arrangement, the correlation between adjacent months can be observed by checking the tiles arranged along the diagonal. Take the heat maps of soil temperature in Burgundy illustrated in Figure 2 as an example, the correlation coefficient of the shallowest layer of soil temperature (ST1) between March and April is around 0 while it rises to 0.5 between Jun and July.
Based on the heat maps of all the 22 climate factors as well as NAO and AO indices of all four wine-growing regions, we observe that, except layered soil moisture (SM1-SM4) and soil temperature (ST1-ST4), the remaining factors show low correlations even in adjacent months and seasons. Secondly, we find that low correlations are demonstrated in teleconnections, surface skin temperature (SST), wind speed (WSP), pressure (PRE), and precipitation (PCT), while clearly high correlations are shown in SM and ST factors. Additionally, the correlations of SM and ST factors become significantly higher as the layers go deeper. For instance, the correlation of soil temperature between March and April in the shallowest layer (ST1) is around 0, but gradually increases to 0.5, 0.7, and 0.9 in ST2, ST3, and ST4. Also, the correlation between April and Jun in SM1 is around 0.2 but also rises to 0.5, 0.6, and eventually 0.8 for SM2, SM3, and SM4.

Sunlight (NSR)
Although sunlight is rarely investigated in the previous studies, yet we find that among all 24 variables, it has the strongest connections with other variables. In all four regions, it is positively correlated to air temperature (TMP), surface skin temperature (SST), plant transpiration (TRP) and air pressure (PRE) while negatively correlated to canopy water evaporation (CWE), canopy surface water (CSW), and precipitation (PCT) as shown in Figure 3. The high correlations with temperaturetype factors (TMP and SST) during summertime (June-August) are evident. It is also observed the sunlight has high correlations with transpiration and precipitation in the early growing-season period (March-Jun). In contrast, the negative correlations with CSW, CWE, and PCT are stable and exist around all growing-season.

Air temperature (TMP)
Temperature is the most commonly employed factor used by the previous studies to determine the influence of climate on viticulture; however, in our study, it is found that the temperature factor correlates with much fewer factors compared to sunlight. It shows positive relations with surface skin temperature (SST), air humidity (HMD), and plant transpiration (TRP), as illustrated in Figure  3. The heating effect between air temperature and surface skin temperature results in the strongly positive correlations during all months. In addition, a strong and stable positive correlation with humidity in all growing-season is also found. The slight positive correlations with transpiration are found again in the early growing-season periods (March-Jun), indicating that the complicated plant physiology is generally linked with more than one single meteorological factors.

Precipitation (PCT)
Precipitation is the second most commonly employed climate factor in previous studies. Nevertheless, we find it only links to few climate factors: it has positive correlations with canopy surface water (CSW), canopy water evaporation (CWE), and soil moisture (RSM) and shows negative correlation with plant transpiration (TRP) as illustrated in Figure 3. As the precipitation naturally influences other water-balance factors as well as soil moisture, positive correlations between these factors in all growing-season months are therefore observed. Moreover, it is also found that PCT shows a relatively weak but more frequently happened positive correlation with root soil moisture. On the contrary, a negative correlation with plant transpiration is shown in the early growing-seasons.

Air pressure (PRE) and wind speed (WSP)
In addition to the critical climate factors mentioned by the previous studies, we also find other obvious correlations within the datasets. For example, the air pressure negatively links to canopy surface water (CSW), canopy water evaporation (CWE), and precipitation (PCT), while wind speed negatively correlates to surface skin temperature (SST) as illustrated in Figure 3.
Since the air pressure implies large-scale atmospheric cyclones behaviors, it would complexly interact with other climate factors and would consequently alter canopy water-balance. Therefore the negative correlations between PRE and CSW, CWE, PCT are expected. The mild wind would slightly lower surface temperature and result in negative correlations between the wind speed and surface skin temperature.

Teleconnections
In addition to climate factors, we also examined the correlations between two teleconnection indices, as plotted in Figure 3. The high correlation over 0.7 is noted in the same month of two indices (months aligned in diagonal), while the adjacent months show weak correlations. On the basis of the correlation analysis mentioned above, we find that the sunlight, air temperature, and precipitation are highly related to other climate factors. In detail, in addition to Burgundy's analysis illustrated above, we summarized all four studying regions' correlations in Appendix Table A1. Generally, four wine-growing regions show a very high similarity of correlated relationships between climate factors as almost the same positive and negative correlations reveal between the same climate factor pairs. However, it is also found that Bordeaux and Champagne have more unique climate patterns: in these two regions, both TMP-CSW and PCT-SST show a negative correlation. Furthermore, Bordeaux also has an inverse correlation between SST and CSW. Moreover, although in general, four regions have very identical climate linkages, the exactly correlated months are slightly different. For instance, even though the negative correlations between NSR and PCT exist in four regions, both Champagne and Burgundy show whole growing season negative correlations, but the other two regions (Alsace and Bordeaux) do not reveal clear correlations in July.

Wine Quality Modeling and Validation
Based on the internal and cross-correlation analysis among climate factors mentioned above, it is realized that the relationships between climate factors and teleconnection indices occur in time sequence are highly complicated. Therefore, the merit of LASSO regression, which is capable of handling collinearity between variables while guarantying satisfactory accuracy is strongly emphasized.
The LASSO modeling results' percentage of explained deviance and the number of nonzero coefficients for each region are plotted in Figure 4. Through the comparison of the curves of four regions, in order to achieve the same level of deviance explaining level, it is evident that Burgundy needs the least variables while Bordeaux requires the most. In detail, if we set the targeted explaining level as 0.7, the numbers of required variables for Burgundy, Champagne,Alsace,and Bordeaux are 8,14,15,and 22;while if 0.8 is set,15,17,19,and 25 variables are chosen,respectively. To validate the model accuracy, the modeled vintage score of 2003-2007 for Champagne; 2006-2010 for Bordeaux, Burgundy, and Alsace were estimated and compared with the true vintage score. The mean and standard deviation of score differences for each region is plotted in Figure 5. It is noted that for Alsace, Burgundy, and Champagne, the models built without teleconnection indices have one standard deviation score residual around -6~5, -6~5, and -5~6 points; and for models established with teleconnections show residual around -6~3, -6~1, -3~5 points. It is obvious that later case yields more accurate vintage in all three regions, which may lead to a suggestion that the influence of teleconnection is obvious in these regions. This finding agrees with the conclusion found in Gonsamo and Chen (2015). Nonetheless, for Bordeaux, both models built with and without teleconnections give poor results, showing relatively large residual more than ±11 points. This result accords with the fact that Bordeaux needs more variables to reach the same deviance explaining level compared to the other three regions as shown in Figure 4.

Analysis of the internal-and crosscorrelations between climate factors
Based on the internal correlation analysis of the same climate factor between different months as illustrated in Figure 2, it is found that all 22 climate factors as well as teleconnection indices show low correlations even in adjacent months, except layered soil moisture (SM) and soil temperature (ST). Moreover, the correlation values of SM and ST are even higher in the deeper layer. We consider it is because the environmental stability is maintained better in the deeper soil layer, as it is more isolated to surface/air climate conditions.
When it comes to the cross-correlation analysis between different climate factors as illustrated in Figure 3, numerous interesting findings were discovered. Firstly, within all employed climate factors, sunlight shows the strongest connections with other climate factors. It might because solar illumination is the principle driven force of most of the climate phenomena. For instance, its high correlations with TMP and SST should because the sunlight illumination would directly increase the surface temperature. Secondly, air temperature and precipitation, which are commonly regarded as the most influential factor in previous studies, show much fewer linkages with other climate factors. Thirdly, a prevailing but weak positive correlation between precipitation and root soil moisture is found, which suggests the excellent water sustainability of soil layers. Fourthly, it is found that Champagne and Bordeaux, especially the latter, might have more unique climate patterns as they show distinct correlation relationships between climate factors. We regard it implies these regions have more complicated climate conditions, which might be one of the reasons causing the lower modeling accuracy of LASSO regression in Bordeaux.
Last but not least, it must be noted that although these correlation analyses provide valuable insights into the interactions between climate factors, we cannot conclude their causations simply based on their correlations.

Analysis of the Selected Climate Factors for Wine Quality Modeling
In addition to the modeling accuracy and the number of selected variables, we further examined the climate factors selected in different levels of explained deviance. We take Champagne as an example, and its variables selected under different levels of explained deviance are listed in Table 2. We observe some worthwhile features noting among all study areas: (1) As the explaining level increases, the total number of employed variables increases. For instance, to reach 0.7 level of explanation, only 14 variables are included, while 17 and 22 variables are needed to reach 0.8 and 0.9 accuracy, respectively. It is reasonable as to reach a higher deviance explaining level, the model requires more variables to depict more detail fluctuations of vintage ratings.
(2) The newly included month(s) of each climate factor in a higher explaining level is mostly the adjacent month(s) of the selected month(s) in a lower explaining level. Moreover, these months would have coefficients in the same direction. For example, when increasing the explaining level from 0.7 to 0.8, the chosen months of the NAO index expand from July (positive) to June and July (positive); when increasing the explaining level from 0.8 to 0.9, the selected months of TMP change from July (negative) to July and September (negative). Since adjacent months usually have similar climate conditions; therefore, their coefficients would show the same direction.

Reason of the Poorer Modeling Accuracy in Bordeaux and the Improvement
As shown in Figure 5, the Bordeaux region represented a less satisfactory modeling efficiency and accuracy, which we assumed is caused by the different Terroir of Bordeaux's well-known left and right banks divided by the Gironde estuary. Although their climate and topological conditions are similar, there are still many different characteristics between the two banks, which would result in the divergent climate-wine quality relationship. For example, the surface in the right bank is mainly covered with easily penetrable limestone or clay layer; however, the surface of the left bank is usually gravel soil (Seguin, 1986), which makes the root of grapevine need to grow deep to reach sufficient nutrients (Bell, 2016;Meisner, 2016;Vinfolio, 2018).  Type Layer Abbre.

Coefficient Coefficient Coefficient
Positive Negative Positive Negative Positive Negative Temp. Air

Sep Sep Sep
Surface Hence, the grapevine root is less sensitive to the variation of air temperature and the surface soil moisture. Furthermore, due to their different winegrowing decisions and traditions, the dominant percentage of mixed-varieties wine production is another critical difference: the left bank generally mixes more Cabernet Sauvignon than Merlot, and vice versa in the right bank (Picard et al., 2016). Since Cabernet Sauvignon generally has a later ripening period and biochemically contains more tannins and alcohol level compared to Merlot (Webb et al., 2007;Muniz et al., 2015;Folly, 2019), they would have different climategrapevine interactions not only in phenological but also in physiological aspects.
Namely, based on the characteristics mentioned above, it is obvious that two banks show different soil types (affecting the importance of root soil moisture/temperature) and phenological stages (controlled by the variety and hence be influenced by different month's climate factor). Therefore, we divided Bordeaux's vintage and climate data into two parts corresponding with each bank and then conducted LASSO regression modeling, respectively. Thankfully as both RP and WS rating systems provide each bank's vintage rating score, we can conduct the bank-wise analysis.
The resultant models of both banks showed improved vintage score deviance explaining abilities than the original bank-merged model as illustrated in Figure 6. For instance, to achieve 0.7 explaining level, for the original bank-merged model dealing with the whole area requires 22 variables; but for the models built for the left and right bank individually requires only 13 and 15 factors, respectively. Also, for the out-ofsample validation, bank-wise models represent a better accuracy, in which the original bankmerged model shows that divergent one standard deviation score residual was more than ±11 points, but both the non-teleconnection-included cases of separated left and right bank represent stable residual within ± 6 points which is comparable to Alsace, Champagne, and Burgundy regions' results as shown in Figure 5.

Current Study's Advancements, Limitations, and Future Work
In the present study, we proposed a multi-region analysis with volume data in not only variable selection but also the long-term time-series. The utilization of climate model-based source of climate factors provides the global transferability and also conquers the previous studies' limitation that only utilized a few sparsely located meteorological stations. Moreover, to efficiently analyze the volume data, the novel regression approach was also employed instead of conventional methods. In detail, the improvements compared to previous studies are summarized as follows: (1) More comprehensive climate factors selection as well as teleconnections indices Compared to previous studies only employed a single climate factor, we employed 22 climate factors, including different climatic aspects such as temperature, water balance, atmosphere, and radiation. To consider the effect of wide-scale atmospheric oscillations, two teleconnection indices were also included. These volume variable FIGURE 6. Bordeaux left and right bank modeling results: solved regression based on GLDAS-2 data with and without teleconnections data are illustrated in black and gray curves, respectively.
selections provide a more holistic climate condition records which largely enhance the reliability of the climate-viticulture analysis. Moreover, based on the cross-correlation analysis between climate factors, it is found that sunlight has the strongest connections with other climate factors, although it is rarely investigated in previous studies. On the contrary, temperature, the conventionally most commonly employed factor, correlates with much fewer factors.
(2) Utilization of a high accuracy global climate model instead of the sparsely located meteorological station We utilized the GLDAS-2 global assimilation system, which provides a satisfactory spatial resolution of a wide selection of climate factors since 1948 instead of the sparsely located meteorological stations' records employed in the previous studies. Furthermore, as GLDAS-2 ingests not only the remote sensing-based measurements but also the ground-based observations, the biased of the climate factors can be constrained. More importantly, as it is a global dataset, the global transferability of our approach can be ensured. By employing the GLDAS-2 data, it is feasible to conduct the climate-viticulture modeling approach proposed in the present study in other wine-growing regions and countries around the globe.

(3) Long-term time-series data analysis as well as multi-region investigation
In the present study, we selected four wellknown wine-growing regions in France with around 40 years of climate factors as well as wine quality data. This panel data (multi-dimensional data measured over time) provides a thorough investigation of the interactions between climate and viticulture, which increases the resultant wine quality prediction accuracy. Furthermore, the robustness of our proposed approach can also be proved.

(4) Applying the advanced LASSO regression method for modeling climate-wine quality relationship
Considering the volume climate variables we included and the complicated causal-influence between them, we applied the state-of-the-art LASSO regression method, which is capable of ingesting collinearity between predictors and reducing selected variables while maintaining the stability of model fitting. Although most previous studies utilized the conventional simple regression methods, when volume dataset is included these methods cannot correctly perform. Moreover, by validation with the vintage scores derived from two authoritative rating systems, our models show satisfying accuracies in all four regions.
However, it must be noted that although the LASSO regression can handle the collinearity existing in volume variables, the final selected variables may not be the most important and representative variables. Previous studies found that LASSO regression tends to select only one variable among a group of highly correlated variables to reduce the sum of the absolute value of all regression coefficients (Tibshirani, 1996). This can be proved by comparing the resultant selected climate factors for each region (Table A1 and A2) and cross-correlation analysis in the Result section. Hence, the highly correlated climate factor pairs' months were not simultaneously chosen in the final variables list.
Nevertheless, considering (1) the goal of the present study is to efficiently and accurately model the wine quality with climate factors, and (2) as mentioned in the introduction and investigated in the Results section, the interactions between climate factors as well as teleconnection indices themselves and their influences on viticulture are considerably complex; therefore, an explanation of climate-viticulture relationship using the remained climate factors is not practical. Instead, we investigated the efficiency, accuracy, and precision of the model by tracking the number of employed variables, deviances explaining level, and out-of-sample validation.
In addition to the improvements mentioned above achieved in the present study, as previous studies suggested, other vital components would also affect the Terroir, such as soil type, geomorphology, and biodiversity (Gladstones, 2011); these factors would also worth to explore in the following studies. Moreover, as the dataset we employed to provide global coverage, we would also apply our approach to other well-known wine-growing countries and regions in different continents with different climate characteristics.

CONCLUSION
The investment value of wine has been widely recognized; however, the interactions between climate and wine quality remain immaturely understood and modeled. This limitation largely constrains a reliable investing strategy. Therefore, a comprehensive investigation and modeling of climate-viticulture relationship, i.e., the influences of climate conditions on wine quality, is vital to predict the wine quality for the investment. Compared to previous studies only utilized few climate factors derived from sparsely located meteorological stations, we employed 22 multitype climate factors, including temperature, water balance, atmosphere, and radiation data, derived from a satisfying spatial resolution, reliable global land assimilation system. Moreover, to emphasize the importance of large-scale atmospheric oscillation to the European region, the NAO and AO teleconnections indices were also included. The comprehensive dataset ensures a robust fundamental for the establishment of the climateviticulture relationship.
Based on the employed volume climate factors, we firstly analyzed the internal and crosscorrelation coefficients of these factors to explore the type-related correlations as well as the existence of high collinearity. We find that within all employed climate factors, sunlight has the strongest connections with other factors as it correlates with the most number of climate factors. In contrast, temperature and precipitation, which commonly treated as critical factors, show much weaker linkages with other climate factors. Additionally, through exploring the internal and cross-correlations between climate factors as well as teleconnection indices, we find overall very low correlations even in adjacent months and seasons. It proves that the dataset we used is appropriate for the following modeling since it can represent all different trends of fluctuations but not monotonous variations.
To analyze the large volume dataset with high collinearity, the LASSO regression algorithm was applied to establish the relationship between climate factors and vintage scores over the four well-known wine-growing regions in France individually from 1970 to 2010. It is revealed that the model, including the teleconnection indices shows better vintage fluctuations explaining ability. In Alsace, Burgundy, and Champagne, our models all demonstrate satisfactory annual vintage deviance explaining capacity with the out-of-sample forecasting ability with one standard deviation score residuals within ±6 points. For Bordeaux, although it initially shows more unsatisfactory results, the performances are significantly improved and achieved comparable accuracies after we divided dataset and reprocessed regression for two banks, respectively. Thus, through the big climate data as well as the LASSO regression method we employed in the present study, the complex climate-wine quality relationship can be accurately established in all four wine-growing regions, which enables the further prediction of the wine quality. Therefore, a robust reference for wine investment based on the observed climate conditions can be provided.
In the future, in addition to the climate and atmospheric factors employed in the present study, other Terroir factors, such as the slope, aspect of topography, as well as the soil type and land cover data, will be considered in wine quality modeling.