Vineyard zonation based on natural terroir factors using multivariate statistics – Case study Burgenland ( Austria )

1Comenius University in Bratislava, Faculty of Natural Sciences, Department of Regional Geography, Planning and Environment, Mlynská dolina, Ilkovičova 6, 842 15, Bratislava, Slovakia 2Comenius University in Bratislava, Faculty of Natural Sciences, Department of Physical Geography and Geoecology, Mlynská dolina, Ilkovičova 6, 842 15, Bratislava, Slovakia 3Silva Tarouca Research Institute for Landscape and Ornamental Gardening, v. v. i. Lidická 25/27, 602 00 Brno, Czechia


Introduction
Discussion between the European Union (EU) and the United States on food product labeling has become a central topic in ongoing trade negotiations.The EU asserts the concept of protection based on geographical indication (GI) and this contrasts sharply with the USA business strategy used in their geographical-origin labeling (Gervais, 2015;Van Caenegem, 2015).
The main GI concept was developed in France at the beginning of the 20 th century, aiming to promote local wine products amid escalating competition in the international wine market.This system also protected the producer's reputation by applying the appellation system.Other European countries, including Spain, Italy and Portugal, rapidly followed France in adopting the appellation system (Barham, 2003).In 2008, the EU has initiated registration of GIs under "Protected Designation of Origins" (PDO) for certain wine sector products (Council Regulation (EC) No 510/2006;European Commission, 2006).Although member states employ some differences in applying GI protection rules, the main GI protection concept is always based on the terroir concept.
The resolution of the International Organization of Vine and Wine (OIV) 333/2010 defines the terroir concept as "applying to the region where the interaction between the organic and inorganic parts of the land and the applied agro-practices constitute the combined characteristics recognizing the uniqueness of the product from these areas" (OIV, 2010).France initiated the terroir concept in connection with wine.These attributes can have direct or indirect effect on the typicality and quality of the region's wine.

Additional interpretations of terroir include:
(A) In addition to environmental factors, human geographical factors such as tradition connected with the transfer of information and the cultural environment can also influence wine character (Vaudour, 2002).
(B) Carey (2001) defines terroir as a complex of natural factors not easily changeable or modifiable by anthropogenic activity.
(C) Gladstones and Smart (1997) consider terroir to be more complex, so that influences on the wine cannot be measured simply as a combination of the landscape's natural components.
GIs are protected not only by the terroir but also by the implementation of specific standards and procedures.In practice, this means the wine must not only come from a specific region, but must also be made by approved procedures and meet specific attributes.The EU's third major approach to GI is that the GI product has total protection from unauthorized use so that new products cannot be labeled in any way that can be mistaken for a GI product (Gangjee, 2012).
United States' protection of all agricultural products, including wine, is based on application for trademark; the producers gain exclusive rights and protection of their product and buyers gain greater guarantee of high-quality products.The U.S. Patent and Trademark Office generally excludes registration of products with trademark names based on geographical locations (United States Patent and Trademark Office, 2016).Further, geographical features can be protected only where these features refer to properties other than the product's geographical origin.America opposes the European GI approach mainly because it considers the European model as a form of protectionism, protecting traditional EU producers and ensuring them advantages over other global market producers.In rebuttal, the EU argues that its approach mainly protects consumers.Under the EU system, consumers are ensured explicit information on products they purchase, and it is harder to confuse similar products with each other (Farrand, 2016).
Discussions on food labeling remain an ongoing topic between the EU and USA at World Trade Organization (WTO) and Trans-Pacific Partnership (TPP) and also in acceptable formulation of the Transatlantic Trade and Investment Partnership (TTIP).An effective argument against implementation of the European GI model is to challenge "region" as the foundation of the PDO.Barham (2003) and Vaudour et al. (2015) support that the PDO borders are not frequently representative of the terroir borders.For this reason, countries which have adopted the European GI model should concentrate on eliminating this conflict.
This study proposes zonation for the Austrian Burgenland vineyard areas.This EU member state introduced the European GI model and created PDO regions -so called Districtus Austriae Controllatus (DAC) -for the most expensive and best quality wines.Hypothesis was formed to test if the present PDO borders correspond to the geographical attributes of the natural terroir units.Partial goals were created to validate the hypothesis: -Creation of the Burgenland wine-growing zonation map based on the physical geographical characteristics which form the natural substance of the terroir determined by multivariate statistics.
-Validation of the actual extent of the Burgenland DAC region borders with the newly formed zonation.
-Suggestions for possible changes to the DAC borders to initiate discussion on this topic and instigate negotiation with authorities and involved organizations.

Study area
In 2009 Austria divided wine into two categories based on viticultural law: wine with and without geographical labels (Weingesetz, 2009).This study concentrates on wines with geographical labels and -107 -OENO One, 2018, 52, 2, 105-117 ©Université de Bordeaux (Bordeaux, France) specifically on wines which are produced within DAC protected regions.The Austrian wineproducing area is subdivided into 16 wine-growing regions with wine geographical indication: seven with "profiling variety" and nine so called DAC regions.In the former regions are included and protected only wines made from selected, regionally typical varieties, while in the DAC regions other factors like respect of prescribed processing techniques defined by regional commission are considered.Four of these DAC regions are situated in Burgenland (Figure 1), with territory covering over 3,961.8km 2 and 138.4 km 2 total vineyard area.Most Burgenland territory is in one of the four DAC regions: Neusiedlersee, Leithaberg, Mittelburgenland and Eisenberg.There are only two areas not in a DAC region and these surround Leithaberg DAC in the northern part of the federal state.The most common Burgenland wine varieties are Zweigeltrebe, Welschriesling, White Burgundy, Chardonnay and Lemberger.

Data
Our homogenous zones could be created by as many relevant physical geographical factors as possible.The input data was collected from all 66,673 officially registered vineyard areas in Burgenland federal state (Amt der Burgenländischen Landesregierung, 2016).Climate indicator data is from the accessible public database WorldCLIM (Hijmans et al., 2005), WIND ATLAS of Austria (Krenn et al., 2012) and the only soil information source is the E-BOD database which has data gathered from one-kilometer-gapped soil-probing areas (BFW, 2016).The digital terrain model enabled calculation of topographic characteristics and Table 1 shows the input data used for zonation.

Methodology
Research methodology is divided into two main steps: (1) dimensionality reduction and finding of common relationships in the multidimensional dataset by factor analysis and (2) creation of zones by cluster analysis.Validation of accuracy of proposed vineyard zonation is represented in a separate step.

Factor analysis
Factor analysis was applied to reduce the multidimensional dataset in order to shrink the size of the original data with as little information loss as possible.It comes from the assumption that the relationships between the parameters are the result of a smaller number of unmeasurable dimensions that are labeled as factors.The first important step is to standardize the data by z-score and evaluate the usability of the dataset for the factor analysis based on the Kaiser-Meyer-Olkin test (KMO) (Kaiser and Dickman, 1959).In addition, data with value larger than 0.6 was appropriate for factor analysis.
In the next step we performed factor analysis on the rest of the data.As one of the goals of the factor analysis is to reduce the input data, we performed a reduction of factors based on the Kaiser criterion.The criterion recommends to keep factors with eigenvalue larger than 1 (Kaiser and Dickman, 1959).
The last part of the factor analysis was the rotation of the factors.Most of the rotation methods are based on a simplicity function which is a function of all factor loadings.It is constructed to reach maximal or minimal values of factor loading.Rotation method selection is researcher-specific and herein we choose orthogonal rotation Varimax.

Cluster analysis
Regions were created by non-hierarchical k-means cluster analysis, because statistical literature advises its use for larger datasets and it is simple to apply (Kurasova et al., 2014).The input dataset was divided into k-clusters so that intra-cluster similarity was as high as possible and inter-cluster similarity as low as possible.Cluster similarity is calculated on the average value of selected characteristics in each cluster.K-means clustering involves first selecting the number of objects or clusters to give parameter k, and each of these objects provides the cluster center of gravity.The remaining objects are assigned to the most similar clusters.Similarity is then calculated by the distance of the object from the cluster center of gravity.A new center of gravity of the newly formed clusters is then calculated and this process is repeated until there are no other changes to improve the relationship of inter-and intra-cluster distances.sum of the squared error is the criterion used for distance.
This criterion was set to create the most compact and separated clusters which suit the conditions of the created vineyard production zones.The k-parameter cannot be estimated from the actual status, but must be set by other methods.Internal validity indices are commonly used to specify k-parameter.No one has yet been able to determine which index is the most precise, and therefore we tested the estimation of the parameter k using five different internal validity indices.We calculated: Ball/Hall (Ball and Hall, 1965), Hartigan (Hartigan, 1975), WBI (Zhao et al., 2009), Calinski-Harabasz (Calinski and Harabasz, 1974) and Xu index (Xu, 1997).We tested clustering from k=2 to k=25, with initial points of gravity in each process set to maximize initial intra-cluster distances.Internal indices were visualized on linear graphs and the optimal parameter value was determined at local maxima or minima (Zhao, 2012).
-109 -OENO One, 2018, 52, 2, 105-117 ©Université de Bordeaux (Bordeaux, France) In addition to application and consequent comparison of index validation we used the input data to estimate the number of clusters by Generalized Cluster Analysis tool in Statistica 10.Automated estimation of the k parameter by this method was chosen to test the validity of k parameter selection by indices.Cluster analysis was then performed on the factor analysis results.Each of the 66,673 analyzed vineyards was subsequently matched to the identified zones.
The final step set the proximity of links between the members of each cluster.The strength of the force, the quality of cluster analysis and the statistical confidence of each object in the clusters were evaluated by multidimensional discriminant analysis (Browne and McNicholas, 2012).This analysis gives the percentage of correctly included members of each cluster using the classification matrix.The inclusion accuracy is then calculated by comparing predefined facilitation of the object to a cluster.Distances in Mahalanobis distances in parameter correlation.The analysis is independent of parameter value range and each member is assigned to the cluster with the shortest distance to the center of gravity.This creates a classification matrix with percentage representation of accuracy in both the created cluster and the entire cluster analysis.
Finally, DAC regions and their borders were visually compared to the created homogenous zones and proposals were drafted for discussion with local/regional authorities.

Results
The delineation of Burgenland vineyard zones based on physical geographical characteristics describing the natural terroir was achieved by detailed analysis of the relevant scientific sources in Table 1.The topographic position index (TPI) was regarded unsuitable for factor analysis and was not used after evaluation of parameter suitability (Table 2).Factor analysis was then repeated without TPI.The overall KMO test for all parameters was 0.89, thus providing appropriate initial conditions for factor analysis without TPI.
Factor analysis results enabled extraction of five factors with eigenvalue over 1 (Table 3) and factors were also extracted outside Kaiser criteria based on scientific studies delimiting natural terroir.These studies predetermined the conditions of occurrence of the common factors influencing wine production.We then applied Varimax rotation on the results of the initial factor analysis (Table 4).
The first extracted factor is characterized according to the high factor load of climate parameters with elevation as TOPOCLIMATE.The correlation coefficient for the analyzed region is above 0.96, so it is obvious that climate conditions strongly correlate with elevation.The second factor is connected with soil reaction and the silt and calcium content.
Although silt is a physical characteristic of soil, we refer to this factor as SOIL CHEMISTRY because soil reaction is allied with soil calcium content and this is strongly linked to deep silty soils.The third factor is called SOIL WATERLOGGING because it reflects the change in soil water absorption and the clay particle content.The fourth factor depicts sand particle changes.While other parameters influencing the fourth factor were not above 0.5, most affected the soil's physical properties; therefore PHYSICAL PROPERTIES OF SOIL was used for this factor.The

Table 3. Results of factor analysis based on the KMO test
According to Kaiser and Dickman (1959), factors with eigenvalues greater than 1 were selected for cluster analysis.
fifth factor was named TOPOGRAPHY because it was strongly linked to both the topographic wetness index and slope.
Parameter k reflects the optimal number of factors and this was set on the internal validity indices values in figure 2. The result was confirmed by V-fold cross validation in Statistica 10 general cluster analysis.
The homogenous zones were created for the purpose of the PDOs to approach the actual terroir borders.
We propose changes to the DAC regional borders in the analyzed zones such as division of Neusiedlersee DAC into two equal zones, where zone 4 equals Neusiedlersee DAC and the second part is joined to Leithaberg DAC because it is also part of zone 1.We also propose creation of a DAC in the northern part of Burgenland.This can be considered by the regional and national committee which reviews inclusion of DAC wines and a certificate for DAC trademark use can be issued if plans include areas currently not part of any DAC (Figure 3).In addition, while Neusiedler Lake comprises the remainder of the Leithaberg DAC which should become a separate DAC region, a different situation exists for Mittelburgenland and Eisenberg DACs where zones 2 and 3 do not cross the borders of the DAC regions currently defined in Austrian law.Therefore we recommend that these regions' borders retain their present form.

Discussion
This research presents a specific step towards delimiting homogenous vineyard production zones OENO One, 2017, vol., x whose borders correspond with those of natural terroir.This should be implemented in each DAC region that labels products based on GIs.The cluster analysis method of identifying vineyard zones has been used in research (Herrera Nuñez et al., 2011;Hugues et al., 2012;Priori et al., 2014), and these authors delimit homogenous zones of natural terroir units by data reduction in principal component analysis (PCA).They then selected the variables for cluster analysis from their input dataset, based on resultant factor scores.
We used factor analysis for data reduction, because factor analysis takes hidden structure of multidimensional data into account.This does not occur in PCA analysis.Factor analysis is a better tool than PCA analysis when there is assumption of knowledge of the factors which describe the input data and it is essential to capture their structure (Beavers et al., 2013).When extracting our factors, we used Kaiser criterion (Kaiser and Dickman, 1959) and also the assumption that the physical and chemical factors of climate, topography and soil property all influence natural terroir.Moreover, PCA does not search for inter-data relationships in datasets, it only explains the maximum amount of variance with the fewest number of principal components and leads to less accurate, if sometimes similar, results.We therefore consider factor analysis more efficient than PCA.
Cluster analysis of k-means based on the extracted and rotated factors employs two methods of estimating parameter k.This is a strong aspect of our research when compared with Priori et al. (2014).
The number of our clusters was set by both the Generalized Cluster Analysis using V-fold cross validation and five different validity indices.The resultant estimation of parameter k for k-means clustering was stable, with the number of clusters close to the actual number of DAC regions in Burgenland.This justifies the assumption outlined in our proposed changes to these regions' borders.
It is important to note that our classified natural terroir zones cross not only intra-regional borders but also administrative borders; therefore DAC zone changes should also be made in neighboring Burgenland regions.We did not incorporate this    5. OENO One, 2017, vol. , x geologic and soil input data classified according to their production potential.These authors thus achieved a detailed description of the natural terroir units which described qualitative characteristics, but this can lead to problems in the field because qualitative research alone is less effective than when combined with appropriate research based on quantitatively interpreted data.The final synthesis excludes solar radiation which has a strong influence on wine production in all global vineyard regions (Weiss et al., 2003), and cluster analysis benefits include available descriptive characteristics of specific vineyard areas in the created homogenous zones.This method enables zone comparison and determination of quantitative differences which are difficult to achieve in overlain maps.The unit borders set by this method are also harder to apply when DAC regional borders change as required in our hypothesis and also to achieve the goals of our research.
In contrast, Hugues et al. (2012) employed hierarchical cluster analysis where vineyard zones are created on a larger scale with similar data capturing only soil conditions.This creates problems when applying hierarchical cluster methodology to the clustered data size.Calculation problems also arise when there is a large number of records and parameters because hierarchical clustering is more difficult than k-means clustering.We therefore found that applying this method to large areas creates unwanted problems.
Widely popular research concentrates on evaluating the potential of terroir vineyard sites where vineyard zonation is based on combined physical geographical attributes (Boyer and Wolf, 2000;Jones and Duff, 2011).These authors do not evaluate zone homogeneity although this actually results from vineyard site zonation and potential areas.Much of this research cannot extract information to use in GI protection; it can only evaluate the total suitability of a specific site.This type of research is more characteristic of agro-economics than market value even though it separates the most valuable zones from those less valuable or unsuitable for quality wine production.

Conclusion
One of the major disputes concerning the GIs of products based on the terroir concept is discrepancy between regional borders where products are made and natural terroir.The borders of the physical geographical environment forming the vineyard region's terroir are often ignored when delimiting the borders of the regions from the "table", or mainly on the basis of cultural and socio-economic characteristics.The methodological approach based on combining factor and cluster analyses proved suitable in identifying homogenous vineyard production zones.The borders of these zones can be used to delimit PDO regions and our research results suggest creation of five production zones instead of four and inclusion of areas omitted from zonation.The process of identification of PDO regions which considers terroir factors can be applied throughout Austria.It can also be used with some modifications caused by input data in other EU member states, which have accepted the European model of protection of agricultural products and especially in those where the PDO regional system is not yet developed.Austria is a perfect example because, similar to other EU member states, it has only formally adopted the European rules of protected labeling for food products.The process of identifying homogenous vineyard zones is certainly applicable to other EU member states which produce wine and have available datasets of soil characteristics.Based on the results and proposed changes, we emphasize that the strongest asset of our research is the applicability of this approach to other regions.

Figure 1 .•
Figure 1.Location of Burgenland DAC regions in Austria.
Topographic data: ELEV -height above sea level; TWI -topographic wetness index; TPI -topographic position index.Climate data (analysis conducted on surfaces derived from the 1950-2000 WorldCLIM database (1km resolution).Data sourceHijmans et al., 2005): EVAP -potential evapotranspiration; GST -average temperature of atmosphere during the vegetation season; HI -Huglin index; GDDgrowing degree days; DTR -diurnal temperature range (difference of temperatures between day and night); CI -cool night index; PERCIPtotal rainfall during the vegetation season; GLOBIR -potential global irradiance; WIND -average speed of wind (data obtained from the 2009-2011 WIND ATLAS of Austria.Data sourceKrenn et al., 2012).Soil and substrate data obtained from BFW (2016): SOMpercentage of organic matter; PH -soil reaction; CLAY -percentage content of clay; SILT -percentage content of silt; CA -percentage content of calcium; SAND -percentage content of sand; DEPTH* -categorized data of depth of soils (1 most shallow -5 most deep); WATER** -categorized data of soil water content (1 most dry -14 most wet).

Figure 2 .
Figure 2. Internal validity indices values used to estimate optimal value of the parameter k for k=2 to k=25.The red circle highlights optimal number of clusters (parameter k).

Figure 3 .
Figure 3. Vineyard homogenous zones based on the natural terroir concept in the federal state of Burgenland.Short description of created zones: Zone 1 -deep silty soils with neutral to slightly alkaline pH; Zone 2 -the highest situated with shallow soils on slopes; Zone 3 -shallow, relative acid and dry soils; Zone 4 -the lowest elevation with high sand content; Zone 5 -the highest clay content.Mean values of indicators used in cluster analysis are given in Table5.

Table 4 . Factor loadings of retained variables after Varimax rotation
Relationships of each variable to the retained factors expressed by the factor loadings.Bold values indicate significant association of the variable to the extracted factors.Abbreviations are explained in Table1.