A climatic classification of the world’s wine regions

Using a dataset with 16 climate variables for locations representing 813 wine regions that cover 99 % of the world’s winegrape area, we employ principal component analysis (PCA) for data reduction and cluster analysis for grouping similar regions. The PCA resulted in three components explaining 89 % of the variation in the data, with loadings that differentiate between locations that are warm/dry from cool/wet, low from high diurnal temperature ranges, low from high nighttime temperatures during ripening, and low from high vapour pressure deficits. The cluster analysis, based on these three principal components, resulted in three clusters defining wine regions globally, with the results showing that premium wine regions can be found across each of the climate types. This is, to our knowledge, the first such classification of virtually all of the world’s wine regions. However, with both climate change and an increasing preference for premium relative to non-premium wines, many of the world’s winegrowers may need to change their mixes of varieties, or source more of their grapes from more appropriate climates.

This article is published under the Creative Commons licence (CC BY 4.0).
Use of all or part of the content of this article must mention the authors, the year of publication, the title, the name of the journal, the volume, the pages and the DOI in compliance with the information given above.

INTRODUCTION
Climatic classifications of wine regions are important, because they allow one to describe and to compare wine regions that share similar characteristics.An example of a wellknown climatic classification was developed by Tonietto and Carbonneau (2004) using three climatic indexes to create a multi-criteria climatic classification system.More recently, various studies have used multivariate statistical methods to group wine regions based on climatic indexes or climate variables.
Examples of these studies are Herrera Nunez et al. (2011) in Italy, Montes et al. (2012) in Chile, Shaw (2012) in 25 Pinot Noir regions around the world, Fraga et al. (2016) and Fraga et al. (2017) in Portugal, Moral et al. (2016) in Spain, Karlík et al. (2018) in Austria, Cardoso et al. (2019) in Northwest Iberia, and Vianna et al. (2019) in Brazil.With the exception of Shaw (2012), who focused on selected Pinot Noir regions from eight countries, these studies focused on just one or two countries.
To our knowledge, there is no study describing and analysing the climate characteristics of virtually all of the world's wine regions using multivariate statistical methods.This research gap may be due to data availability issues.However, we have an opportunity to address this gap by obtaining location information on 16 climate variables for 813 wine regions that account for over 99 % of the world's winegrape area (Anderson and Nelgen, 2020a;Anderson and Nelgen, 2020b).This winegrape area database is an updated and expanded version of an earlier variety x region vineyard area database (Anderson, 2013).
The aim of this research is to classify virtually all of the world's wine regions in groups that share similar climate characteristics.Using a multivariate statistical approach allows for the grouping of similar characteristics into a smaller set of components, which is easier to do than examining all 813 regions with 16 climate variables.Because the dataset used for this classification includes information on the mix of varieties in each of these regions, it also allows us to infer the potential of the world's wine regions for highquality wine production in the wake of climate change and a shifting demand towards premium wines.

Data
The source of the data for the 813 wine regions is Anderson and Nelgen (2020a).These regions are sometimes legally defined geographical indications, but they are mostly delimited by political boundaries.A concordance between these regions and the ones in the World Atlas of Wine (Johnson and Robinson, 2019) is provided in Anderson and Nelgen (2020b).We use the locations reported in Anderson and Nelgen (2020b), which represent municipalities within or close to each wine region, to extract climate data representing each region, for the 16 climate variables described in Table 1.The source of the climate data for the wine regions is TerraClimate (Abatzoglou et al., 2018).
TerraClimate is built from multiple databases and uses climatically aided interpolation, combining high-spatial resolution (1/24°, ~4-km) climatological normals from the WorldClim dataset, with time-varying data from CRU Ts4.0 and the Japanese 55-year Reanalysis (JRA55).TerraClimate is updated annually, but at the time of this analysis it included the period of record 1958-2018.For our analysis we focused on the 30-year period from 1989 to 2018, but we also used data for the period 1959-1988 for comparisons of the evolution of climate between the two periods.
The climate data extracted from TerraClimate for this study is based on one geographical location per region, usually a town or city within or adjacent to the region.The ideal climate data would be an average for the area devoted to vines within the qualified geographic boundaries of each region (spatial data).However, since such data are not available for all regions worldwide, we believe that a location extraction provides a general estimation of the area's climate and helps link these aspects to the varieties grown in each region.Other studies have encountered the same data availability issue, and they have also relied on one location for each region as a proxy of the spatial mean of each climate variable in each region.Examples are Tonietto and Carbonneau (2004) who examined 97 locations near or within wine regions, and Shaw (2012) who examined locations near or within 25 Pinot Noir wine regions.

Methods
Principal Component Analysis (PCA) is a dimensionalityreduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller set that still contains most of the information in the larger set.PCA starts with the eigen decomposition of a correlation matrix.The eigenvectors from this decomposition are uncorrelated and normalised (orthonormal).We subjected the 16 climate variables for the 813 locations to PCA.
We used the principal components with Eigenvalues greater than 1.0 resulting from the PCA as the input for doing a k-means cluster analysis.With too many variables (16 in our case), the k-means algorithm efficiency can be affected.This is because seeking neighbours (as is the case in the k-mean algorithm) in high dimensions is difficult as it may seem like the data points are too far away, even though all other dimensions are close to each other.For this reason, we performed PCA before the k-means cluster analysis.
K-means clustering allows observations to be classified in a predetermined number of (k) groups.This is a partition method and, unlike hierarchical cluster analysis methods, each observation is assigned to only one group.The process starts with all observations randomly assigned to the k groups.
The mean for each group is calculated and each observation is re-assigned to the group with the closest mean.This process repeats until no observation changes group.K-means allows more than one variable to be employed by using a similarity or dissimilarity measure.For this study, we use the Euclidean distance, arguably the most used measure (Wu, 2012).
German Puga et al.Source: Authors' computation.Notes: The climate variables are described in Table 1.
Stopping rules are helpful for choosing the optimal (k) number of groups.Milligan and Cooper (1985) evaluated a wide variety of stopping rules and concluded that the Calinski-Harabasz index is the best rule for non-hierarchical cluster analysis.Therefore, we used the Calinski and Harabasz (1974) pseudo-F index stopping rule to assist us in determining the optimal number of groups.A larger value of the Calinski-Harabasz index is preferred, as it signals a more distinct solution.

RESULTS
The data for the 813 wine regions provide evidence of the diverse climates that exist in the world's wine regions.
Table 2 shows the summary statistics for all the regions combined.This climatic variability is explained by latitudes that range from less than 10 degrees to almost 60 degrees from the equator, and elevations as low as sea level to as high as almost 3,000 meters above sea level.
For example, annual precipitation (AnnP) ranges from basically zero in one of the driest regions of the world in the northern Chile to 2,996 mm in Taiwan.In addition, annual temperatures (AnnT) range from quite cold (less than 8 °C) at higher latitude locations in Canada and Norway to above 26 °C in regions such as India and Southeast Asia.
Table 3 shows the results of the PCA.This table provides the Eigenvalues and the explained variance of the components.The Eigenvalue (or the proportion of the explained variance) of the first component is 8.52 and it explains 53 % of the variation in the data.Choosing the components with Eigenvalues greater than 1.0, which is the mean Eigenvalue, is one of the most used objective criterion for selecting the number of components for data reduction (Jolliffe, 2002).Therefore, we chose the first three components (i.e., Comp1-3).These three components explain 89 % of the variance in the data, demonstrating that PCA is a useful datareduction technique in this case.1. Sum is the sum of the three principal components (Comp1-3).Unexplained is the proportion of the variance for each climate variable that is unexplained by the three principal components.
Table 3 also provides the principal component loadings.PC1 accounts for 53 % of the variation in the data and distinguishes regions that are warmer and drier from regions that are cooler and wetter.The regions that are warmer and drier also have medium to high DTRs, and higher VPDs and SRADs.The regions that are cooler and wetter also have medium to low DTRs, and lower VPDs and SRADs.
PC2 explains an additional 27 % of the variation in the data with the loadings highlighting locations that have high GS and RP precipitation with lower DTR and warmer nights (+CNI) versus those that have low GS and RP precipitation, high DTR and cooler nights (-CNI).The wetter locations also tend to have warmer temperatures and relatively low VPD, while the drier locations have cooler temperatures and higher VPD.The first two PCs account for most of the variation in the data (80 %), with PC3 accounting for an additional 9 % with loadings appearing to distinguish between locations that are wet and have high DTRs and those that are dry and have low DTRs (Table 3).
The eigenvectors in Table 3 are small and never greater than 0.5.For testing the significance of the eigenvectors, we estimated the PCA with the standard errors and related statistics (see Supplementary Material).This estimation relies on the assumption that the data have a multivariate normal distribution.This assumption can be justified by the relatively large sample size, thus the central limit theorem applies, and because PCA itself uses the central limit theorem implicitly by transforming the variables to a zero mean and unit variance.The results of this estimation show that, while the eigenvectors are small, all but two are statistically significant, which justifies the inclusion of all the climate variables in the analysis.
The last column in Table 3 shows the proportion of the variance for each climate variable that is unexplained by the three principal components.The variance of each of the 16 variables is well explained, with only 11 % unexplained on average.The least explained variables are SRAD_SU and SRAD_GS, followed by AnnP and AnnT, which extend beyond the growing season, meaning they are arguably less relevant for this analysis.Even so, a large proportion of these variables is explained by the first three components.
We used the three principal components from the PCA for the k-means cluster analysis.To choose the k number of groups, we calculated the Calinski-Harabasz index for k-means cluster solutions with two to 14 groups based on the three principal components.The results suggest that a solution with three groups indicates the most distinct clustering.
Figure 1 is a score plot based on the first and second principal components, where each of the 813 points represents a region and each of the three colours represents a group of regions.A similar interpretation can be inferred from graphs for the first and third and for the second and third principal components (not shown).
Figure 2A shows the regions plotted against their GST and GSP.Groups 1 and 3 are warmer than Group 2. Group 3 is, on average, wetter than Group 1, while a wide range of GSP is observed for Group 2. A large degree of overlap between Groups 1 and 3 is evident in Figure 2A.These two groups would appear more distinct in a three-dimensional graph with GSDTR on the third axis.That is because part of the difference between the regions that overlap is given by their difference in GSDTR.The regions in Group 1 have a higher GSDTR (Figure 2B).A wide range of GSDTR is observed for Group 2.
German Puga et al.  Figure 3A shows the regions plotted against their VPD_GS and GSP.Group 1 has higher VPD_GS than Groups 2 and 3.This also explains part of the overlap between Groups 1 and 3 in Figure 2A. Figure 3B shows the regions plotted against their SRAD_GS and GSP.A wide range of SRAD_GS is observed in the three groups, although the average SRAD_GS is highest for Group 1 and lowest for Group 2. The first map in Figure 4 shows that there are regions in the three groups across the globe.Regions from Group 1 account for most of the surface in the New World, which is evident from the second map in Figure 4, where the size of each region is proportional to its area.Group 1 includes most of the winegrape area in Argentina, central Chile and South Africa, and a big proportion of the area in the United States, Australia and Chile.Group 2 is mainly represented by New Zealand, some regions in Chile and most of southern Australia, and by New York and coastal and northern regions in western North America.Last, Group 3 comprises most of Brazil and Uruguay.
The winegrape area outside of the Old World has a larger share of its area in Group 1, whereas the Old World winegrape area is distributed more evenly across the three groups.Source: Authors' computation.Notes: The climate variables are described in Table 1 .

TABLE 5.
Mean values and differences in mean values for the two periods (P1: 1959-1988; P2: 1989-2018) for each group and for all regions.Source: Authors' computation.Notes: The climate variables are described in Table 1.
Besides looking at the differences in the PCA and cluster memberships between the two periods, we explored climatic differences between these periods.Table 5 provides the mean values and differences in mean values for the two periods for each of the three groups of regions and for all observations.Annual precipitation has decreased slightly in all groups, while the precipitation in the growing season has decreased slightly in the driest group (Group 1) and increased in the wetter groups (Groups 2 and 3).In all groups, temperatures have increased, especially in the warmest months, and daily temperature ranges have decreased.These changes in temperatures explain part of the changes in the vapour pressure deficits, which have increased across the three groups.As expected, average day/night downward surface shortwave radiation has not changed much over the two 30 year periods.The changes in medians rather than changes in means (see Supplementary Material) suggest some slight differences in the interpretation of these changes, but they reinforce the observation that the three groups of regions are warmer and have higher vapour pressure deficits.
We conducted paired t-tests on the equality of the means between the first and second period, for each climate variable, and for each group of regions and all the regions combined.
The results show that these differences are all statistically significant at a 1 % level with the exception of the differences in GSDTR and RPDTR for Group 2, and SRAD_GS for Group 1.The last column in Table 5 shows the differences in climates between the two periods for all observations combined, all of which are statistically significant.Overall, both GSP and GSDTR increased (and decreased) in about half of the regions.GST, instead, increased in all but 2 regions.The increases in GST was higher than 0.5 °C in 76 % of the regions, and higher than 1 °C in 46 % of the regions.VPD_GS increased in 93 % of the regions, while SRAD_GS increased in 72 % of the regions.

DISCUSSION
This classification provides a description of the climates of the world's wine regions across a wide range of variables, including precipitation, average temperature, diurnal temperature range, vapour pressure deficit and surface shortwave radiation.Compared to prior research classifying climates in wine regions, this classification utilises site locations across a wider range of regions that together encompass virtually all the world's winegrape area.
Despite its advantages, this classification has at least two limitations.First, the climate variables are based on extracting location data from one point in or near wine regions.A better representation would come from using approved wine region boundaries (e.g., GI, PDO, AVA, etc.), summarising spatial climate data across the wine regions, but these boundaries are not available for the majority of the regions studied.
Second, there may be other climate variables that are also relevant, but which were not available in the spatial data used to extract the location data.Furthermore, the spatial climate data is aggregated to the time periods, so models that use daily data inputs could not be used.In addition, having phenological data for the main varieties in the region would allow for the application of novel models, such as Grapevine Flowering Véraison and Grapevine Sugar Ripeness (Parker et al., 2020).The impact of temporal variability in grapevine phenology (Hall and Blackman, 2019) is therefore not accounted for in this analysis.Moreover, considering that terroir is important for winegrape production and quality (van Leeuwen et al., 2020;van Leeuwen et al., 2018), the interactions between soils and climates are not reflected in this climatic classification.
This classification reveals that premium regions can be found in each of the three groups of regions.Group 1 includes Sonoma and Napa Valley (California), Uco Valley (Argentina) and Barossa Valley (Australia).Group 2 includes Bordeaux and Burgundy (France), Mosel Valley (Germany) and Marlborough (New Zealand).Group 3 includes Piemonte and Toscana (Italy) and Rioja (Spain).These are just some examples of premium regions that can be found across the climate types identified in this research, depending on style criteria and other factors (see Supplementary Material).
The comparison between the two periods in our analysis reveals evidence of a changing climate in the wine regions.The increase in average temperature during the growing season (GST increased by 0.8 °C) and the decrease in temperature range are perhaps the most concerning changes in relation to winegrape quality.The influence of temperature on berry composition makes it the key climatic factor affecting winegrape quality (Davis et al., 2019;Hall and Jones, 2009;Pons et al., 2017).Temperature range variables (e.g., GSDTR) also are often related to winegrape quality, as cooler nights can be positive for aroma and colour development due to a decrease in carbon use by respiration (Schultz, 2016).
Figure 5 shows the estimated GST ranges for producing highquality winegrapes in the Northern Hemisphere, according to Jones et al. (2012).In parentheses on the vertical axis is the share of the global area of each variety (Anderson and Neglen, 2020b) that is planted within that temperature range.The 21 varieties in this graph account for 45 % of the global winegrape area and a much higher share of premium regions.
In aggregate, 44 % of that area is cultivated outside those temperature ranges identified for high-quality winegrape production in Figure 5.Most of that share which is not within those temperature ranges comprises regions that are too hot, rather than too cold.The vertical lines in Figure 5 show the mean GST for each group.Groups 1 and 3 have mean GSTs that are higher than the ideal temperatures for producing high-quality wine from the varieties represented in the figure.
Combined, these regions accounted for 60 % of the world's winegrape area in 2016.van Leeuwen et al. (2013), however, argues that the upper limits from Figure 4 are underestimated and our research here indicates that as well.
It is also likely that some form of adaptation in grapevines to changes in climate has already occurred (van Leeuwen et al., 2013).However, with additional German Puga et al. warming in the future, further adaptation, either in the plant system or in vine management, will likely be necessary as the share of the global winegrape area within the GST ranges for high-quality winegrape production will continue to decline.Most regions will need to adapt to further changes in climate, including some of the premium regions that may be subject to deteriorating quality (Santos et al., 2020).While warmer growing seasons are sometimes beneficial in some of the coolest regions, such as the Mosel Valley in Germany (Ashenfelter and Storchmann, 2010), years with significantly higher temperatures are associated with a decrease in quality in most of the world's current wine regions.Decreases in quality that may be induced by climate change is happening at a time when the preference for premium wine is increasing (Anderson et al., 2018).Should this trend continue, the need to adapt to climate change will only intensify (see Santos et al. (2020) for a review).
Much of the adaptation to climate change can take place in wineries.For example, oenological advances can help lower alcohol concentrations and increase acidity in wines -two issues that will intensify in some wine regions due future warming (Dequin et al., 2017).However, part of the adaptation process will need to take place in the vineyards.Modifications in plant material include using different rootstocks or clones.New breading technologies that rely on genome editing techniques have a promising potential to produce plant material that can mitigate the effects of climate changes, but that potential is currently limited by the state of advancements and the perception that winegrowers and consumers have about these technologies (Dalla Costa et al., 2019).Therefore, winegrowers may need to diversify their production towards varieties that can produce high-quality wines in warmer growing seasons.
There is little evidence, however, that the latter is happening at a global scale; between 2000 and 2016, the share of global area for the 21 varieties in Figure 5 that are cultivated within the temperature ranges shown there decreased from 60 to 56 %.The Supplementary Material provides a table with 1,565 winegrape varieties ranked from highest to lowest area-weighted average GSTs in the world, which may be useful for identifying varieties that might be better adapted to warmer climates.Another option for winegrowers who wish to retain their varietal mix is to source more winegrapes from regions with more-appropriate climates.

CONCLUSIONS
We used information on 16 climate variables to classify 813 wine regions that account for over 99 % of the world's winegrape area using multivariate statistics, namely PCA and k-means clustering.The 813 regions were clustered into three groups of regions that are characterised by precipitation, average temperature, diurnal temperature range, vapour pressure deficit and surface shortwave radiation variables.This is, to our knowledge, the first classification of wine regions that covers virtually all the world's winegrape area.By grouping the regions into clusters that share similar climates, we provide an easy-to-interpret description of the climates of the world's wine regions.This classification reveals that premium regions can be found across all three climate types.
The comparison between two time periods (1959-1988 and 1989-2018) suggests that the climate of each of the three groups has already changed.Current and further increases in temperature, detailed by the AR6 (IPCC, 2021) and others, may be the most concerning changes in terms of winegrape quality when the global demand for wine is likely to continue shifting towards more premium products.Therefore, winegrowers in some regions may need to use varieties that are more appropriate for warmer climates and/or to purchase or plant vineyards in cooler regions to maintain typicity of wine styles.
The present analysis could be enhanced by using spatial climate data as opposed to location data, and by including additional climate variables that may prove useful in better understanding vine growth, productivity and fruit quality.
To do so would require a global database of governmentally approved wine region boundaries, allowing for a spatial assessment of all regions, and robust global climate dataset with spatial resolutions and a wide range of variables suitable for assessing viticulture and wine production.In addition, having spatial climate data that reflects temporal variability (i.e., monthly or daily data), as well as variables that are not climatic but relate to the terroir, the vine and winegrape quality (i.e., soils, phenology and fruit composition) would enhance this type of analysis.Further research could also incorporate climate change projections across all wine regions globally and consider the implications of the future climate scenarios on the wine production sector.This would allow an analysis of the potential for some winegrape growing to shift to potentially more appropriate climates and regions.Future studies could also identify winegrape varieties growing successfully in regions with a similar climate to what any particular region is expecting its climate to become in the decades ahead.The database analysed for this research can also be used for that purpose, because it includes the area by variety for more than 1,700 prime varieties for all the 813 regions we have classified (Anderson and Neglen, 2020a).
Furthermore, the results of this study indicate that more research needs to be done on climate thresholds for winegrapes varieties worldwide.While Jones et al. (2012) provide a framework for a small subset of the varieties planted worldwide, further work is needed to examine the temperature thresholds for a wider range of economically important varieties.Enhanced models using phenological observations (Parker et al., 2020) are clearly useful in this regard, yet data availability across both regions worldwide and a larger set of varieties (Anderson and Neglen, 2020b) would be needed to refine our understanding of climate limits to vine growth, productivity and quality.

FIGURE 5 .
FIGURE 5. Optimal GST ranges for high-quality winegrape production (shares of world winegrape area under the grey ranges are shown in parentheses on the vertical axis).Source: Jones et al. (2012) and authors' computation.Notes: The vertical lines in show the mean GST for each group.
Source: Authors' computation.Notes: The climate variables are described in Table

Table 4
includes 246 regions that cover 33.9 % of the total winegrape area.The Supplementary Material provides a table with the climate data and cluster classification for each region.Table 4 also provides the summary statistics for elevation.While there are wide ranges of elevation across the three groups (see Supplementary Material), on average, Group 1 has the highest elevations and Group 3 the lowest.