Comparison between standardized sensory methods used to evaluate the mousy off-flavor in red wine

The variation in descriptors for mousy off-flavor may be related to the different compounds involved in this spoilage, their concentrations, the matrix effect, individual detection abilities, the composition of a subject’s saliva, and the pH of the tongue’s surface. These different sources of variability partly explain the lack of consensus concerning the perception of this defect in wine. Several different sensory methods have been developed by wine professionals and scientists, based on the pH-dependency affecting the perception of some key mousy compounds. The objective of this study was to compare different sensory methods for detecting mousy character in red wine under standardized conditions, using alkaline paper strips and pH adjustment. Among the methods tested, adjusting pH to around 5 increased the consensus among tasters, and the detection and discrimination capacities of panelists.


INTRODUCTION
The mousy off-flavor in wine is, in part, caused by microbial spoilage.This wine defect has reappeared in recent years and is sometimes associated with the significant decrease in the use of sulfur dioxide, the increase in pH, and the use of native microbiota (Massini and Vuchot, 2015).To date, three N-heterocycle bases, namely 2-acetyltetrahydropyridine (ATHP) (Strauss and Heresztyn, 1984), 2ethyltetrahydropyridine (ETHP) (Craig and Heresztyn, 1984), and 2-acetyl-1-pyrroline (APY) (Herderich et al., 1995) have been identified as being responsible for the mousy offflavor in wine.The simultaneous presence of at least two of these compounds is necessary for the off-flavor to be perceptible (Costello et al., 2001).Considering its concentrations in wine and detection threshold values (Table 1), the contribution of ETHP to the mousy character may be less significant than that of ATHP and APY.
Production of mousy N-heterocycles in wines is attributed to yeasts in the Brettanomyces genus and some wine lactic acid bacteria (LAB) in the Lactobacillus genus or Oenococcus oeni species.Several strains of B. anomala and B. bruxellensis known to be associated with the spoilage of wine or other fermented beverages have been shown to produce a mousy taint when fermenting grape juice or when re-inoculated into sound wines (Heresztyn, 1986;Grbin and Henschke, 2000;Romano et al., 2008).Their ability to produce ATHP, ETHP, and, to a smaller extent, APY, has been confirmed using media supplemented with various chemicals, including ethanol, lysine, and ornithine (Heresztyn, 1986;Grbin et al., 1996;Grbin et al., 2007;Romano et al., 2008).The first LAB species linked to mousy off-flavor were strains of L. hilgardii and L. brevis (Tucknott, 1977;Heresztyn, 1986).These bacteria have been shown to produce large amounts of ATHP and smaller quantities of APY and ETHP when incubated in a synthetic medium (Costello and Henschke, 2002).Several O. oeni strains have also been found to produce a strong mousy off-flavor in a grape juice medium supplemented with ethanol.These strains produced all three N-heterocycles, including ETHP, at concentrations higher than those of other LAB strains (Costello et al., 2001).Grbin et al., 1996 reminiscent of rodent urine (dirty mouse cage), and grilled foods, such as popcorn, rice, crackers, and bread crust (Tucknott, 1974;Buttery et al., 1983;Strauss and Heresztyn, 1984;Herderich et al., 1995;Bartowsky, 2009).Some tasters also mention dried sausage skin, vomit, or dirty mops.The variation in descriptors for mousy off-flavor may be related to the different compounds involved in this spoilage, their concentrations, the matrix effect, and individual detection abilities (Snowdon et al., 2006).
From a sensory point of view, a lack of consensus has been observed in the detection, identification, and characterization of mousy offflavor in wine.Interindividual differences in sensitivity to the various key compounds directly affect the quantitative and qualitative evaluation of this mousy off-flavor.For APY, individual detection threshold measurement (olfactory sensitivity evaluation) revealed that the dilution factor between the lowest and highest concentrations exceeded one thousand (personal data obtained with 23 subjects).Moreover, one taster may be sensitive to APY and less sensitive or specifically anosmic to the other chemical markers.
Other sources of differences between subjects have been observed.This defect is perceived during retronasal evaluation, when wine comes into contact with saliva (Bartowsky and Henschke, 1995), and may persist in the mouth for more than 10 minutes after swallowing or spitting (Grbin et al., 1996).Indeed, APY, ETHP, and ATHP are not sufficiently volatile to be perceived on the nose at wine pH (Bartowsky and Henschke, 1995).In this case, the more polar amino form of the ATHP tautomeric pair is favored (Grbin et al., 1996).Oral pH is higher (near 7) than that of wine (from 2.8 to 3.8) (Obreque-Slier et al., 2016), thus explaining enhanced perception of this defect during retronasal evaluation.Neutral pH favors the imino form that is less polar than the protonated one.Grbin et al. (1996) reported a correlation between the composition of an individual's saliva, the pH of the tongue's surface, and the ability to detect the mousy character.However, wide interindividual variations in oral pH have been observed, between 5.76 and 7.96 (Larsen et al., 1999).An average intraindividual variation of 0.91 was also observed according to the food consumed, the time of day and the physiological state of the subject.Moreover, a recent study demonstrated that the buffering capacity of wine prevailed over that of saliva (Obreque-Slier et al., 2016).A few microliters (less than 0.3 mL) of wine mixed with the mouth saliva for 15 seconds were sufficient to decrease the pH of saliva by 1 unit.
These different sources of variability partly explain the lack of consensus concerning the perception of this defect in wine.Several different sensory methods have been developed by wine professionals and scientists, based on the pH-dependency affecting the perception of some key mousy compounds.
The first method is the "Palm & Sniff" technique, where one drop of wine is placed on the back of the hand and then the skin is sniffed (Grbin et al., 1996).This method was already mentioned by Peynaud and Domercq (1956).Skin has a higher pH than wine, so it is speculated that this method increases the volatility of the mousy off-flavor marker compounds.Several authors also propose using alkaline paper strips dipped into culture media or wine to assess the presence of mousy character (Heresztyn, 1986;Costello et al., 1993;Grbin and Henschke, 2000).The sodium hydroxide on the strip promotes the formation of the volatile tautomer of the mousy compound.Oxidation also seems to play an important role in the stability of this compound and, consequently, the perception of this defect (Weerawatanakorn et al., 2015).
Surprisingly, from an enological point of view, some wines express mousy taint after oxidation (Grbin et al., 1996).Considering the instability of APY and ATHP in the presence of oxygen (Weerawatanakorn et al., 2015) and contrasting results showing that wine oxidation has a positive influence on mousy taint, the mechanisms behind the expression of mousy offflavor are clearly complex.
These techniques provide an orthonasal evaluation of wines, with the aim of improving detection of this defect, but it is unclear whether these techniques also improve tasters' assessments and consensus among them.
The objective of this study was to compare different sensory methods for detecting mousy character under standardized conditions, using alkaline paper strips and pH adjustment.The "Palm & Sniff" technique was not tested, as it did not minimize interindividual variations, potentially due to wide variations in skin pH (Lambers et al., 2006).
Three different stages were considered in this research.In the first experiment, consensus between the subjects was evaluated using intensity-scoring tasks.In the second experiment, ranking tests were used to assess the discrimination ability of the panel.Finally, detection threshold measurements in wine were used to assess changes in individual detection ability after pH adjustment.

General conditions
Sensory analyses were performed according to the relevant ISO standards.Samples were evaluated in individual booths in a ventilated tasting room at controlled room temperature (ISO 8589, 2010).All samples were evaluated only orthonasally (all data were collected via olfaction alone).

Judges
Judges were selected on the basis of availability and interest.Participants in the sensory panels were volunteers.All panelists were research laboratory staff from ISVV, Bordeaux University, with equivalent homogeneous sensory expertise, who performed discrimination and descriptive tests regularly (at least twice a week).They were especially familiarized with the detection of off-flavors in wines.Moreover, they shared a common representation of the mousy character.
Some of the assessors were unable to participate in the whole testing program due to timetable conflicts.The whole test was divided into three sessions, consisting of different sensory tasks: the mousy intensity-scoring task, the ranking test, and detection threshold measurements.The intensity-scoring and detection ability experiments were carried out by 18 assessors and the ranking task by 24 participants.On average, 76% of women participated in each session.The age of the participants ranged from 23 to 53 years (mean ± SD, 34 ± 9).
Two pH adjustments similar to buccal values were tested.The wine was supplemented with 5 or 33 g/L sodium bicarbonate to adjust the pH to 5 < pH < 5.5 (modality 2) or 7 < pH < 7.5 (modality 3), respectively.Then, a 25 mL sample of each wine was presented for orthonasal evaluation in a coded black INAO glass covered with a Petri dish.Detection thresholds were measured in brown bottles (30 mL, open diameter = 2 cm, with phenolic resin caps and PTFE joints (VWR; Fontenay-Sous-Bois, France), as described in the protocol section.
Sample preparation time was standardized to avoid sampling bias.All samples were prepared 2 hours before the sensory session.

Intensity-scoring task
Seven wines were evaluated by the panel with the aim of measuring perceived mousy character intensity in a scoring task, using the four sample modalities: control without pH adjustment and modalities 1, 2 and 3 (Table 2).
The wine samples included potentially spoiled wines (MO, CH, GA), and a standard wine (PO) supplemented with APY [99583-29-6] (Ark Pharm Inc., 10% in triacetin).Moreover, clarified supernatants (CS) of Costello culture media inoculated with supposed positive strains for mousy character production (one strain of O. oeni CRBO0501, coded CS1, and one of B. bruxellensis CRBOL0509, coded CS2) were also used to contaminate wine.These stimuli were used to counteract the commercial unavailability of some key mousy compounds (ATHP and ETHP) and the possibility that other compounds may contribute to the mousy off-flavor.

Ranking test
A standard red wine (Gamay -Pays d'Oc 2015) was supplemented with APY [99583-29-6] (Ark Pharm Inc., 10% in triacetin).The concentrations added to the standard wine were in accordance with those found in wines (Grbin et al., 1996) According to the results of the intensity-scoring tasks, three sample preparation modalities were tested: control without pH adjustment and modalities 2 and 3.The pH values of the control, modality 2, and modality 3 samples were 3, 5.01, and 7.02, respectively.

Detection threshold measurements
The detection threshold was measured in the standard wine (pH = 3.2) and in the same wine after pH adjustment to 5.05.The modality order was randomized among the participants.The perithreshold concentration ranges of APY (from 0.4 to 400 µg/L) to be used were determined on the basis of preliminary experiments and were consistent with those measured in wines (Grbin et al., 1996).The concentration ranges followed a geometric dilution series with a factor of 2. Four milliliters of stimulus were presented in randomly-coded brown bottles.
Only APY was tested in the ranking task or in the threshold measurement, because ATHP and ETHP were not commercially available.Moreover, APY is one of the compounds contributing the most to the mousy character.

Intensity-scoring task
For each modality, the perceived intensity of the mousy character in seven samples was evaluated by marking a cross on a 10 cm continuous scale (from "no mousy character" to "intense mousy character") (ISO 4121, 2004).It was specified that this defect is associated with different descriptors such as rodent urine, popcorn, rice, crackers, bread crust, dried sausage skin, vomit, or a dirty mop.A sample of the standard red wine was presented to the panelists at the beginning of each test per modality.This sample was representative of a wine without mousy character.
The sample presentation order was randomized among the participants.Subjects carried out the four modalities in the same session, but were advised to take some breaks between each series of samples.The order of testing the four sample preparation modalities was also randomized.
Principal component analysis was carried out on the sample x assessor matrices for each modality.In each case, consensus was tested by computing the Kendall concordance coefficient (W).
However, as violation of statistical assumptions of ANOVA was observed (Levene' and Shapiro-Wilk's tests were performed to test the equality of variances and residual normality, respectively), a non-parametric test was applied: Friedman's tests with Nemenyi pairwise comparison (non-parametric statistics) were used to evaluate wine discrimination.All the statistics were calculated using the XLSTAT, 2018.3,Addinsoft software.

Ranking test
For each modality, the panel performed a ranking test (ISO 8587, 2007), sorting four samples according to their mousy character, from least to most intense.Ties were not allowed.Sample order was randomized among panelists.
The results of these ranking tests with a known order of intensity were statistically interpreted using the Page test (NF ISO 8587, 2006).The following statistical treatment was applied for each modality.For each judge, a value between 1 and 4 was attributed to each sample, depending on the assessor 's response (1 for the least intense, 4 for the most intense).The sums of the rankings were calculated for each sample, then the parameters L and L' were calculated.
where J = number of panelists, P = number of products.
L' was compared to reported values of the reduced normal law to determine whether the test results were significant for the factor concerned (to conclude to a significant ranking, L' ≥ 1.645, = 0.05).The Cabilio-Peng procedure was used for pairwise multiple comparisons (XLSTAT, 2018.3, Addinsoft).

Detection threshold measurements
Detection thresholds were estimated by the three-alternative, forced-choice presentation method (3AFC -ISO 13301, 2002), using ascending concentrations.At each level, participants were given a series of three bottles.One bottle contained the odorized wine and two blanks contained unaltered wine.Subjects were instructed to decide which of the three samples was different.As the purpose of the study was to compare the subjects' olfactory abilities at various pH, the same series was presented to each subject.
The individual detection threshold was estimated as the geometrical mean between the last concentration missed and the first concentration OENO One, 2018, vol. , x detected, when a participant made three consecutive correct choices (Wise et al., 2008;Tempere et al., 2011).
The concentration/response function fitted by a sigmoid curve (y = 1/(1 + e(−λx)) was designated a psychometric function and used to determine the group threshold.The probability was corrected with the chance factor for detection: 1/3.The software used for graphic resolution and non-linear regression by ANOVA transform was Sigma Plot 13 (2014, Systat Software, Inc.).

Intensity-scoring task
This experiment revealed different patterns of interindividual consistency for the four preparation modalities.Figure 1 shows the loadings of the panelists on the first two principal components of the PCA performed on the mousy intensity scores for each modality.An interindividual consensus was only observed for modality 2 (Figure 1C).All panelists were on the positive side of the first axis (49% of total variance), indicating that they tended to score the wines in a similar way.The second principal component (23% of total variance) revealed interindividual differences in the mousy character assessment.This is in correlation with Figure 2A, showing the projection of wines on the first two components: the whole panel scored two wines (GA and POlowAPY) as not contaminated.Analysis of the second principal component identified two groups of wines: POhighAPY, POCS1, and POCS2 on the positive side of the second principal component and wine CH on the negative side, revealing some interindividual diversity in the panelists' perception of this defect.This variation may be due to the chemical complexity of this defect and interindividual variations in sensitivity among panelists.
Correlation was poorer between panelists' scores for control, modality 1 and modality 3, but better for modality 2. This visual interpretation was validated by computing the Kendall concordance coefficient (W) for control, modality 1, modality 2, and modality 3, which were 0.17, 0.11, 0.42, and 0.26, respectively.The results obtained for the control modality were consistent with the effect of pH on the perception of mousy character.However, although the paper strip method (modality 1) is commonly used, it was not, apparently, adapted to wine comparisons.It is probably better-suited to evaluating microorganism culture media.

Ranking test
L and L' values for the different ranking tests are reported in Table 3.The L' value must be superior to the critical value, 2.326 (standardized normal distribution), and central limit t to confirm significant discrimination (p < 0.01) among samples by the panelists.
Table 3 shows that the judges were generally able to distinguish among samples with different APY levels.However, pairwise multiple comparisons revealed that the four samples were clearly discriminated only for modality 2.
Control modality and modality 3 were less efficient for discriminating less contaminated samples.

Detection threshold measurements
This experiment was designed to explore the impact of pH adjustment on detection abilities.Individual best estimate thresholds for control modality and modality 2 are shown in Figure 3. of APY at concentrations found in contaminated wines increased from 22 to 100%.
Stimulus-response functions and sigmoid modeling were used to determine absolute thresholds (Figure 4).The absolute detection thresholds (50% of the panelists) for APY were 54.9 µg/L (R 2 = 0.91) in wine at pH 3.20 and 8.6 µg/L (R 2 = 0.91) in wine after pH adjustment to 5.05.
The absolute threshold (group threshold) for APY was reduced by a factor of 6.4.
Adjusting pH to an average of 5 facilitated detection of the mousy off-flavor without requiring a retronasal evaluation, a potential source of additional interindividual variations.

CONCLUSION
These sensory data complement our knowledge about the standardized sensory methods used to evaluate the mousy off-flavor in wine.This work compares orthonasal evaluation methods, to reduce the interindividual variations due to retronasal evaluation.Among the three methods tested, adjusting pH to around 5 increased the segmentation capacities of panelists.This change in wine pH also ensured a good consensus among the panelists and clear discrimination among the samples, according to the contamination level.The paper strip method resulted in high interindividual diversity and adjusting pH to around 7 rather than 5 was not as effective.It is important to note that these adjustments do not fully model normal wine tasting conditions.Indeed, although oral pH is similar to the test values (average 7), pronounced decreases in pH are only observed around 15 seconds after tasting (Obreque-Slier et al., 2016).Perhaps adjustment to pH 7 strongly distorts the wine matrix.
Further work is required to compare these sample preparation methods for rosé and white wines, and also to corroborate these results or validate this method with the other key mousy compounds.Moreover, sensory results may be confronted with analytical data obtained on naturally spoiled wines.Finally, the validation of a standardized method avoiding pH effect may allow to compare the data obtained with experts to consumer acceptance.

FIGURE 2 .
FIGURE 2. Evaluation of the mousy off-flavor in modality 2 (n = 18).A. Projection of wines on the first two components of the principal component analysis.B. Comparison of sums of ranks.Values marked with different letters are significantly different (Friedman test and Nemenyi pairwise comparison test; p < 0.05).

TABLE 1 .
Detection threshold values and average concentrations of the key mousy compounds in spoiled wines.

TABLE 2 .
Description of red wines used in the scoring test and their pH characteristics according to the test modalities.Modality 1 corresponded to the paper strip method, for which no precise pH measurement can be performed.

TABLE 3 .
Results of the different ranking tests.