Abstract
Introduction
A psychometric function describes the relationship between an observer's performance on a psychophysical task and some physical aspects of the stimuli. In particular, the psychometric function for speech intelligibility in noise describes a listener's ability to identify speech as a function of its intensity. Often, the psychometric function is summarized by two key parameters: the
The slope is crucial, as it—not the threshold—determines the increase in perceptual benefit a listener is likely to gain from small changes in the signal-to-noise ratio (SNR), such as may be offered by a directional microphone on a hearing aid. A steep psychometric function indicates that a small increase in SNR would lead to a large increase in intelligibility; conversely, if the slope is relatively shallow, the same SNR improvement would lead to a smaller perceptual improvement. We demonstrate here how much the slope of the psychometric function varies across experiments.
There is a wealth of psychometric function data available in the literature on speech identification, as many studies have looked at the factors that can affect the intelligibility of speech. Most of the published analyses of these data, however, have focused on changes in threshold, with slope changes far less commonly calculated and reported. No systematic corpus of these data is available, despite its obvious importance for isolating and identifying the factors associated with changes in slopes. We therefore carried out a systematic survey of the literature on psychometric functions for speech intelligibility, reanalyzing the data using a standard method to enable a direct comparison of slope data across different studies.
Our aims were to (a) quantify how much the slope of the psychometric function varies across experimental designs and listening conditions, (b) identify listening conditions that affect the slope of the psychometric function, and (c) discuss how these trends in slope conform with previously proposed explanations for variations in the slope of the psychometric function for speech intelligibility.
Methods
A computerized literature search was undertaken to find studies that had measured the intelligibility of speech as a function of SNR. The first reports of common speech tests and the studies citing these speech tests were reviewed, as many of these studies include psychometric functions in different noise conditions. A search was also carried out for articles citing either Egan, Carterette, and Thwing (1954) or Brungart (2001a)—these two studies were singled out as they reported unusually shaped psychometric functions of masked speech. The reference list of Brungart's article was also reviewed for possible studies to include in the survey. Other miscellaneous studies containing psychometric functions that were found over the course of approximately three years, up to a cutoff date of February 2012, 1 were also included.
The inclusion criteria were that studies needed to report at least one psychometric function for speech identification that was (a) measured as a function of SNR or some other unit of relative presentation level from which SNR could be calculated, (b) measured over at least three points, (c) presented clearly in graphical or tabular form, and (d) averaged over several listeners. Individual data were excluded because we found that these data tended to be harder to accurately measure (e.g., multiple overlaying psychometric functions). Although interlistener variability in slope would undoubtedly provide additional insight into the factors affecting slope, such an analysis of the data was outside the scope of the current study, which aims to identify broad trends in slope across different listening conditions. Micheyl, Xiao, and Oxenham (2012) provide an example of a detailed reanalysis of psychometric data that does explicitly take into account individual variability.
A total of 146 relevant studies were found, giving 1,133 individual psychometric functions for further analysis. The individual data points for each psychometric function were recorded. These values were either taken directly from the article if the psychometric functions were reported in tabular form or extracted using a custom-written MATLAB program if the psychometric functions were displayed graphically. These data points were then fitted with a logistic function:
For consistency across all studies, none of the logistic fits was corrected for either chance or maximum performance. The information required for these corrections was not always available, and it was considered preferable to follow a standard procedure for all cases rather than correcting only a subset of the data. It is possible that this lack of correction for chance and ceiling effects could have affected slope estimates (Dai & Micheyl, 2011). Cases for which the standardized psychometric function was an extremely poor fit were excluded, however, to limit the effects of such errors in slope estimates (see Overview section).
The values of
Each psychometric function in the survey was subjected to detailed coding of the experimental design for (1) target speech corpus (see later), (2) masker type (subcategories of
The target and masker speech were coded by the type of speech corpora used (e.g., BKB, IEEE, CRM, and SPIN).
3
If this information was not available, or if the speech corpus was uncommon, the speech corpus was coded under the categories of
Results
Overview
To measure how well the logistic equation fitted the data, a root mean square (RMS) error value of the curve from the data points was calculated. On the whole, the fits were regarded as good as the RMS was small (mean RMS = 3.2%). However, 29 psychometric functions had RMS values of 10% or greater and so were excluded from the survey at this stage (they are further discussed in the Nonmonotonic Psychometric Functions section). Figure 1 shows example data from the survey and illustrates some good, as well as some poor, fits of the logistic functions to the data.
Example psychometric functions from the survey illustrating examples of good, average, below average, and poor fits of the standard logistic function (solid line) to the data (open circles). The RMS value gives an indication of the fit, with cases where the RMS value was above 10% being excluded from the survey. Cases that gave good fits include those for SSI sentences in a one-talker masker (Dirks & Wilson, 1969a), SPIN sentences in a six-talker babble (Elliott, 1979), and digits in a speech spectrum static noise (HearCom, 2009). Cases that had average fits (i.e., RMS values close to the mean for the survey) include those for SPIN sentences in a six-talker babble (Dirks, Bell, & Rossman, 1986), CRM sentences in an amplitude-modulated noise (Arbogast, Mason, & Kidd, 2002), and IEEE sentences in a Gaussian noise (Bernstein & Grant, 2009). Example cases that had below-average fits include those for CRM sentences in a two-talker masker (Wightman & Kistler, 2005), digits in a six-talker babble (Wilson et al., 2006), and invalid short tokens in a one-talker masker (Danhauer, Doyle, & Lucks, 1986). Examples of poor fits include valid sentences presented in a one-talker masker (Dirks & Bower, 1969) and CRM sentences in a one-talker masker (Brungart, 2001a).
Key Details of All the Studies Included in the Systematic Survey.
It was found that a log-normal distribution (Buzsáki & Mizuseki, 2014; Johnson & Kotz, 1970) gave an excellent fit to the overall frequency distribution of slope values:
The overall distribution of slope values measured in the systematic slope survey, across all 885 cases (see Equation 2). The solid line is a log-normal distribution fitted to the data. The median for the distribution is indicated by an arrow.

Major Trends
With 885 cases, it is not too surprising to find substantial variations across details of stimuli, maskers, and other aspects of experimental design. The analysis here therefore concentrates on broad categories rather than on specific individual combinations. The full data set is available in the supplementary material.
Type of masking noise
Median Slope Values for Each of the Primary Target/Masker Combinations Identified in the Survey.
Number of Studies Reporting Data for Each of the Target/Masker Combinations in Table 2.
Figure 3 shows the overall distributions of slope values found for three different masker types: speech, modulated noise, and static noise.
5
In an attempt to disentangle the effect of the type of masker used from the slope effect seen when the number of maskers was increased (see Number of Masking Noises section), only cases where a single masker was used were included in this figure. There is a substantial difference between the three distributions: the measures of central tendency (i.e., median and mean slope values) decreased in value from static noise maskers (median = 7.7% per dB) through modulated noise (median = 6.1% per dB) to speech maskers (median = 3.7% per dB). This last median was considerably shallower than that of the overall median slope reported earlier (median = 6.6% per dB), suggesting that the shallowest end of the distribution was more densely populated by cases that used speech maskers.
The distributions of slope values for three different categories of masker: speech, amplitude-modulated noise, and static noise. The dotted lines indicate the overall median slope value for the survey, while the arrows indicate the median slope value for each specific distribution. Only cases where one masker was used are included.
Number of masking noises
The second major trend is that the slope of the psychometric function tends to increase as the number of maskers increases, at least up to approximately three or four maskers. Table 2 shows that increasing the number of speech maskers from one to two increases the slope by, on average, 4% per dB, which begins to approach the values produced by either a modulated noise or static noise masker.
Figure 4 shows the distribution of slope values as a function of the number of maskers used. To avoid a confound of the effect of masker type on slope, only psychometric functions measured using speech maskers were included.
6
It can be seen that the distributions were shifted to the right and to larger values as the number of maskers was increased from one to two, to three or more. Only in the one-masker condition was the median slope value (median = 3.7% per dB) below that of the overall median slope value shown in Figure 2. The distribution in the bottom panel is for cases with 5–20 speech maskers. The distribution, mean, and median slope values for this condition were very similar to those found when three or four maskers were used. This would suggest that once the number of maskers reached three or four, any additional maskers had a negligible effect on the slope.
The distributions of slopes found when one, two, three or four, or greater than five maskers were used. The dotted line indicates the overall median slope value for the survey, while the arrow indicates the median slope value for each specific distribution. Only cases where speech maskers were used are included.
Minor Trends
Although the type and number of maskers used had a large effect on slope, these factors cannot solely account for all the slope variation seen in the survey. For example, there was a range of 16% per dB between the lowest and highest slope values for cases with one speech masker (see Figure 4, top panel). Several more minor trends in slope will now be briefly described.
Predictability of target speech
Figure 5 compares the slopes of psychometric functions for highly predictable speech targets with those for less predictable speech targets. The data came principally from experiments where the SPIN sentences (Kalikow, Stevens, & Elliot, 1977) were used as targets, as this is the main corpus in which the degree of target predictability is manipulated. The left column includes slope values for speech maskers, whereas the right column includes slope values for noise maskers.
7
For the speech maskers, a clear effect was found, with less predictable targets producing markedly shallower slopes (median = 7.1% per dB) than highly predictable targets (median = 13.8% per dB). This slope difference was reduced if the masker was noise; however, here, the low-predictability median slope was 5.4% per dB and the high-predictability median slope was 8.6% per dB. In addition to a difference in median slope values, there was also a difference in the width of the distributions of the slope values between the high and the low predictable targets: When either speech or static noise maskers were used, broader slope distributions were seen for the highly predicable targets than for the less predictable targets.
The different distributions of slope values found when there was either a high or low probability of target speech being predicted from previous context. The left panels plot these distributions for speech maskers, while the right panels plot these distributions for static noise maskers. The dotted lines indicate the overall median slope value for the survey, while the arrows indicate the median slope value for each specific distribution. Only cases where one masker was used are included.
Target corpus
Figure 6 shows the distributions of slope values for targets taken from various corpora. The slopes measured using four standard speech tests (CRM, HINT, IEEE, and SSI) are displayed separately for speech maskers (left column) and static noise maskers (right column).
8
The data show that when a speech masker is used, the choice of target corpus has little effect on slope (median slopes = 3.7%, 3.4%, 4.5%, and 4.6% per dB for CRM, HINT, IEEE, and SSI, respectively), but a large variation in slope is seen when the masker was a static noise (median slopes = 10.1%, 9.1%, 4.8%, and 17.1% per dB; IEEE gave the lowest while SSI gave the highest).
The distribution of slope values found for four different speech corpora (CRM, HINT, IEEE, and SSI), when they were presented in speech maskers (left panels) and when they were presented in static noise maskers (right panels). Again the dotted lines in each panel indicate the overall median, while the arrows indicate the median for each category of target, and only cases where one masker was used are included.
Similarity of target and masker voices
Figure 7 shows the distributions of slope values for varying degrees of target/masker voice similarity. The subcategories of similarity include (unprocessed)
9
target and maskers spoken by the same talker, by a different person of the same gender, or by a person of a different gender. These subcategories include cases where only one speech masker was used. The slopes for the same talker category were shallower than those for talkers of different genders (medians of 3.4% compared with 5.0% per dB). The distribution of slopes given when the target and masker were of the same gender but spoken by different people, however, overlaps with each of the other distributions. This wider distribution may reflect the greater variation in similarity for this subcategory, that is, some same-gender voices were likely to be more similar than others.
The distributions of slope values found for speech maskers with three different levels of talker similarity to the target speech: same talker, same gender talker, and different gender talker. The dotted lines indicate overall median slope, while the arrows indicate individual medians for each distribution. Only cases where one masker was used are included.
Other minor effects
Prior exposure to, or priming, some aspects of either the target or masker before a trial also affects the slope of the psychometric function. Slope values tended to be slightly steeper when either the target or masker sentence was primed compared with when no prime was presented (medians of 7.8% per dB,
The content of the masking speech also has a small effect on slope. When the content of the masker was very similar to that of the target, for example, when they were taken from the same speech corpus, slopes tended to be shallower (median = 4.6% per dB,
There was also an indication that listener age had an effect on the slope of the psychometric function. There was a trend of increasing slope with age when a speech masker was used (
The hearing ability of the listeners (normal hearing, hearing impaired, or cochlear implant user) was coded for in the survey. In cases using a speech masker, there was a trend of increasing slope with hearing impairment (medians = 6% per dB and 7.5% per dB,
The results for the effect of age and hearing impairment on slope are somewhat tentative, however, as the sample sizes of the groups were particularly unequal in both types of comparison. Further, the two effects are difficult to disentangle as in 98% of cases including young listeners, the listeners were also normal hearing, and in 70% of cases including older listeners, the listeners were also hearing impaired, thus partially confounding the effects of age and hearing impairment.
Nonmonotonic Psychometric Functions
As previously noted, any cases where the data had to be extrapolated to fit a logistic function, or cases where the logistic functions were a poor fit to the data, were excluded from the slope survey. The latter was mostly due to extremely shallow or unusual psychometric functions. These generally took two forms: functions where performance plateaued over a specific SNR range (usually −12 to 0 dB) before increasing at higher SNRs, and functions with
The majority of functions in this subset were from speech maskers where only one masker was used (19 of 23). While these nonmonotonic psychometric functions were measured using several different speech stimuli, the two largest contributors were from using CRM stimuli (10/23) and valid sentences (5/23). Most occurred when the same talker was used in the target and the masker (18/23), whether the target was unprocessed (9/23), processed (e.g., vocoded, 7/23), or mixed with other maskers (2/23). The listening conditions giving the shallowest slopes fit with the trends reported earlier for shallow slopes identified in the main slope survey.
Discussion
We systematically surveyed the published data on the psychometric functions for speech intelligibility to identify the main factors that affect its slope. Large variations in slope were found, with slopes ranging from as shallow as 1% per dB to as steep as 44% per dB. The median value across 139 studies (885 cases) was 6.6% per dB. The type and number of maskers used were major factors on the value of the slope of the psychometric function. Other minor effects of target predictability, target corpus, and target/masker similarity were also found. There was also an indication that age and hearing impairment might also affect slope, although it was not possible for the current survey to completely disentangle these two effects.
Slope Changes as a Consequence of Fluctuating Maskers
Our analyses have clearly demonstrated that masker type affects the slope of the psychometric function, with speech maskers found to give shallower slopes than noise maskers, be they amplitude modulated or static noise. The number of speech maskers used also affected the slope of the psychometric function, with the slope of the function increasing as the number of maskers was increased from one to about three or four. Given that speech can be thought of as the sum of multiple amplitude-modulated frequency bands (Drullman, Festen, & Plomp, 1994) and that increasing the number of maskers will alter the quality of the amplitude variations (Cooke, 2006; Miller, 1947), both of these effects indicate the importance of masker amplitude modulations on slope.
The effects of amplitude modulation on slope can be understood by considering A schematic illustration of the nonlinear increase in speech intelligibility that arises with amplitude-modulated maskers. Panels (a) to (c) represent a speech signal presented in a static noise. As SNR is decreased (i.e., the masker is increased), the proportion of the signal that is audible decreases, as does speech identification. Panels (e) to (g) illustrate the same speech signal presented in an amplitude-modulated noise. This time, as SNR decreases, glimpses of the target are still available, which can be used to aid in speech identification. Even at the lowest SNR in Panel (g), a large proportion of these glimpses still remain. Panel (d) shows an example psychometric function for speech (CRM sentences) in a static noise, and Panel (h) shows an example psychometric function for the same speech stimuli in an amplitude-modulated masker.
When a single competing talker is used as the masker, the temporal fluctuations are relatively slow, and there are likely to be many opportunities where the target speech will coincide with a dip in the amplitude of the masker, that is, there will be many opportunities for glimpsing the target speech (Miller & Licklider, 1950). As more maskers are added, the spectral and temporal dips begin to fill (Cooke, 2006; Miller, 1947). The chance that the target will temporally overlap with at least one of the maskers becomes greater, and overall amplitude modulations in the masking mixture effectively become shallower and briefer. The opportunities for glimpsing the target, therefore, become fewer. The reduced opportunity for glimpsing leads to an increase in slope. In the extreme case, if enough voices are added to the masking signal, then it would approach that of a speech-shaped static noise (e.g., Cooke noted that when six or more masking voices were present, intelligibility was not significantly different from that of a speech-shaped static noise masker). Our analyses demonstrate that only three or four masking voices are needed before the slopes of psychometric functions became equivalent to those given by a static noise.
Curiously, we found that amplitude-modulated noises did not give substantially shallower slopes than the static noise maskers, as might be expected by this glimpsing argument (see Figure 3). This could possibly be explained by the wide range of maskers that fell into the category of modulated noise, that is,
There was an indication from the survey that older, hearing-impaired listeners tended to give steeper psychometric functions than young normal hearing listeners when speech was presented in a competing speech masker. This finding accords with the slope pattern that would be expected if this listener group were less able to make use of brief dips in the power of background noise to help identify target speech, as has previously been suggested (e.g., Festen & Plomp, 1990). This reduced glimpsing ability for older, hearing-impaired listeners has been attributed to a reduced temporal resolution (Lutman,1991; Schneider, 1997) and, in the case of listeners with normal hearing thresholds but with deficits listening in noisy environments, to reduced fidelity when encoding suprathreshold sounds (Bharadwaj, Verhulst, Shaheen, Liberman, & Shinn-Cunningham, 2014). Reduced glimpsing would, in general terms, result in an amplitude-modulated masker acting more like a static noise masker, which would lead to a steeper psychometric function.
Slope Changes as a Consequence of Target/Masker Confusion
The slope survey identified 23 cases where the psychometric function was nonmonotonic. Most of these functions were produced when a speech masker was used, and nearly all of those functions were given when at least one of the speech maskers was spoken by the same voice as the target. These results suggest that a high degree of similarity between the target and the masker is required to give nonmonotonic psychometric functions. The survey also demonstrated that even when psychometric functions were monotonic, manipulating the acoustic similarity of the target to the speech masker affected the slope, as shallower slopes were found when the target and masker voices were spoken by the same person than when they were spoken by people of different genders. Linguistic similarity between the target and masker, that is, if they were both taken from the same speech corpus, also tended to result in shallower psychometric functions. Conversely, there was the suggestion that providing a cue that could aid in the differentiation of a target from a masker when both were speech, such as providing a prime of the target voice or content, could steepen the slope of the psychometric function. These effects combined indicate that the degree of confusion that exists between a target and a masker can be a factor in the resultant slope of the psychometric function.
The role of confusion on slope can be explained by increased reliance on a level difference between target and masker signals (Brungart, 2001a; Dirks & Bower, 1969; Egan et al., 1954). Such reliance is thought to occur when difficulties arise disentangling elements of a target signal from a similar sounding masker signal. In such cases, if the target is either less intense or more intense than a masker, then the level difference can be used as a cue to distinguish which sound is which. The greater the reliance that is placed on this cue, the more dissociated intelligibility is likely to become from overall SNR. Intelligibility can in principle be better at negative SNRs, where a clear level difference exists between the two signals, than at SNRs near zero, where the level difference is smaller. Extreme confusion between a target and a masker (i.e., where both signals are spoken by the same person) can, therefore, have the effect of flattening the slope of the psychometric function or even giving a dip in the function near 0 dB (i.e., where there is no level difference cue available).
Slope Changes as Consequence of the Availability of Top-Down Information
The survey demonstrated that target stimuli that contained keywords that were predictable from their content gave steeper slopes than those whose keywords were unpredictable. It was also demonstrated that targets taken from some speech corpora gave shallower slopes than others. The speech corpora whose targets tended to give shallow slopes were commonly
Pichora-Fuller et al. (1995) suggested that congruent previous context constrains possible word options, shifting the influence of word identification from perceptual (bottom-up) to cognitive (top-down) information. The mechanism is essentially positive feedback; with a greater dependence on top-down information, word identification can increase more rapidly with changes in level as small increases in acoustic information may be sufficient to further constrain possible speech elements. The probability of other speech elements then being guessed correctly increases, resulting in a steepening of the psychometric function (Bronkhorst et al., 1993). If, however, there is little top-down information available to constrain word options or if this information is incongruent with the rest of the utterance (as is the case when keywords are unpredictable), intelligibility will be based on bottom-up information alone and will thus increase more slowly as level is increased, giving a relatively shallow psychometric function. Several individual studies have clearly demonstrated this effect (Dirks et al., 1986; Dubno, Ahlstrom, & Horwitz, 2000; Elliott, 1979; Kalikow et al., 1977; Lewis et al., 1988; Pichora-Fuller et al., 1995).
Aside from slopes being generally steeper when target speech could be predicted from its context, it was also noted in the current survey that distributions of slope values tended to be broader for such targets than for those whose content was unpredictable. It is possible that this difference in slope distributions reflects a variation in the reliance on context and top-down information by different listeners across studies. It has been suggested, for example, that older listeners can benefit more from supportive context than younger listeners can (Pichora-Fuller et al., 1995). A greater reliance on context would, as mentioned earlier, have a tendency to steepen the slope of the psychometric function while a greater reliance on perceptual information would have a tendency to flatten the slope of the psychometric function. A shift in the balance of these two strategies may, in part, be the reason that steeper slopes were seen in the current survey for older, hearing-impaired listeners than for younger, normal hearing listeners, and the greater variation in the use of context by listeners across studies may explain the broader slope distribution for predictable, compared with unpredictable, target utterances.
The number of possible responses available in a speech test can also alter the relative contributions of perceptual and cognitive factors in speech identification. The SSI, for example, is usually presented as a closed-set corpus (Speaks & Jerger, 1965) in which listeners are asked to match presented sentences to a list of a possible 10 sentences. Top-down information in this case can very effectively constrain identification; only part of the sentence needs to be audible for identification to be successful. Small changes in audibility, therefore, can have large effects on intelligibility resulting in a steep slope. The IEEE corpus, on the other hand, is open set (Rothauser et al., 1969), as it consists of 720 sentences on different topics. Top-down information is far less constraining in this case. Although the context of the sentence may allow some top-down influence, speech identification will be much more heavily dependent on bottom-up information for these speech stimuli compared with the SSI, thus giving less improvement in intelligibility as SNR is increased and so a shallower psychometric function.
The CRM corpus is also a closed set, offering 32 response options. The survey demonstrated that despite this, the CRM corpus tended to give relatively shallow slopes (e.g., 3.7% per dB with a single speech masker). This may be partially explained by the fact that there are no contextual or semantic cues available in CRM sentences to aid in the identification of the keywords. The CRM keywords are likely less constrained by top-down information than the SSI corpus. Also, studies that used CRM sentences as targets also commonly used CRM sentences as maskers. This increased similarity between the target and masker, as described in the Slope Changes as a Consequence of Target/Masker Confusion section, may also explain the shallower than expected psychometric functions for this particular speech corpus when presented in a speech masker.
Conclusions
The slope of the psychometric function for masked speech varies greatly (mean, 7.5% per dB; range, 0–44% per dB). Understanding the factors affecting the slope of the psychometric function and the mechanisms that underlie these slope changes is important, as it gives a means of gauging the amount of perceptual benefit that can be expected given a specific change in SNR in a specific listening condition.
The survey of 885 psychometric functions has demonstrated that the type and number of speech maskers both had an effect on slope as did the choice of target corpus, its predictability, and its similarity to the masker. Three broad underlying mechanisms were outlined to explain why there is such a large variation across listening conditions, these mechanisms including slope changes as the result of amplitude modulations in the masker, confusion between the target and the masker, and the availability of top-down information. In particular, single speech maskers are likely to give particularly shallow slopes, as they contain amplitude modulations that offer extensive opportunities for glimpsing while still sharing acoustic and linguistic features that may become confused with the target speech.
The current survey has highlighted that the slope of the psychometric function, and therefore the amount of perpetual benefit that can be gained from an increase in SNR, is not fixed but instead varies greatly depending on both target and masker selection. These findings would suggest that care needs to be taken in selecting both target and masker stimuli for speech research with consideration made about the likely shape of the psychometric function, as well as the likely threshold. That the slope of the psychometric function can vary so much is particularly pertinent for listeners who struggle with speech-in-noise understanding and who rely on a hearing aid to provide improvement in speech audibility. The slope for these listeners will relate directly to the amount of benefit they might expect to receive from their hearing aid. The current study was unable to ascertain the direct effects that hearing impairment and age had on the slope of the psychometric function. These effects are an important direction for future research, as an understanding of them is crucial if we wish to quantify the amount of perceptual benefit a listener is likely to gain from any change in SNR offered by a hearing aid.
