Auditory filter (AF) shape has traditionally been estimated with a combination of a notched-noise (NN) masking experiment and a power spectrum model (PSM) of masking. However, there are several challenges that remain in both the simultaneous and forward masking paradigms. We hypothesized that AF shape estimation would be improved if absolute threshold (AT) and a level-dependent internal noise were explicitly represented in the PSM. To document the interaction between NN threshold and AT in normal hearing (NH) listeners, a large set of NN thresholds was measured at four center frequencies (500, 1000, 2000, and 4000 Hz) with the emphasis on low-level maskers. The proposed PSM, consisting of the compressive gammachirp (cGC) filter and three nonfilter parameters, allowed AF estimation over a wide range of frequencies and levels with fewer coefficients and less error than previous models. The results also provided new insights into the nonfilter parameters. The detector signal-to-noise ratio () was found to be constant across signal frequencies, suggesting that no frequency dependence hypothesis is required in the postfiltering process. The ANSI standard “Hearing Level-0dB” function, i.e., AT of NH listeners, could be applied to the frequency distribution of the noise floor for the best AF estimation. The introduction of a level-dependent internal noise could mitigate the nonlinear effects that occur in the simultaneous NN masking paradigm. The new PSM improves the applicability of the model, particularly when the sound pressure level of the NN threshold is close to AT.
When developing models of human hearing for communication devices, it is important to have an accurate representation of auditory filter (AF) shape. Psychophysical estimation of human frequency selectivity has a long history (See reviews in, for example, Baker & Rosen, 2006; Eustaquio-Martín and Lopez-Poveda, 2011; Leschke et al., 2022; Moore, 2012; Oxenham & Shera, 2003; Patterson & Moore, 1986; Unoki et al., 2006). Fletcher (1940) developed a noise-band widening experiment to define the “critical band.” However, it proved difficult to measure AF shape over a useful dynamic range. Psychophysical tuning curves (PTC) have been used to estimate human frequency selectivity (Houtgast, 1973; Kidd & Feth, 1981). However, this method does not prevent off-frequency listening and tends to overestimate the bandwidth (Johnson-Davies & Patterson, 1979). A notched-noise (NN) experiment (Moore, 2012; Patterson, 1976) was developed to overcome previous problems, and it is currently a popular method for AF shape estimation.
In NN experiments, threshold for a sinusoidal signal is measured in the presence of a broad band of noise (Moore, 2012; Patterson, 1976; Patterson et al., 2003). A notch is created in the noise around the signal frequency, , and threshold is repeatedly measured as the width of the notch is increased. The resultant threshold function is assumed to provide an estimate of the shape of the integral of the AF at that signal frequency. The shape of the filter itself is derived from the threshold function using a relatively simple power spectrum model (PSM) (Fletcher, 1940) of tone-in-noise masking.
Although AF estimation using the NN method has been widely applied, a number of challenges remain: (i) the relationship between the psychoacoustic AF shape and the physiological cochlear response, (ii) the relationship between the AF and absolute threshold (AT), (iii) the number of parameters and hypotheses needed to estimate the AF. (iv) nonlinear effects in the AF estimation. These are described below.
(i) Psychoacoustic and physiological AF
The rounded exponential (roex) filter (Patterson, 1976) has been used in most papers to specify the AF shape for both simultaneous and forward masking paradigms. Because the roex filter is a simple frequency-domain weighting function, there is no impulse response that could correspond to the cochlear traveling wave response (Pickles, 2013). Even in the frequency response, the upper and lower slopes of the roex filters were estimated independently although there must be some constraints in the traveling wave. Some of the constraints can be derived from the AF impulse response measured with the reverse correlation technique (Carney & Yin, 1988; Carney et al., 1999; de Boer & de Jongh, 1978). Irino & Patterson (1997) developed a gammachirp AF to introduce more realistic constraints on AF shape. Irino & Patterson (2001) demonstrated that the cGC (Appendix B) can account for both physiological response data (Carney et al., 1999) and psychophysical NN threshold data.
(ii) AF and AT
Moore (2012) pointed out that AFs differed considerably across hearing impaired (HI) listeners. Some of them had extremely broad filters; others had filters with the opposite asymmetry to those typically observed (Glasberg & Moore, 1986; Nitschmann et al., 2010). But there was no clear relationship between broad AF shapes and the frequency response of the cochlear traveling wave observed by von Békésy (1960) and Rhode (1971) (See also Pickles, 2013). One of the major problems encountered with HI listeners is that of small differences between the sound pressure levels (SPLs) of the NN masker and the AT value. The dynamic range of measurement is limited by the noise floor at the AT. The conventional PSM does not account for the effect of AT and tends to overestimate the bandwidth of the AF. In this paper, we hypothesize that the incorporation of AT into the PSM would resolve this problem. To investigate the interaction between AT and the NN-masked signal threshold (hereafter referred to as “NN threshold” for simplicity), it is essential to measure thresholds over a range of NN levels.
(iii) The number of parameters and hypotheses
It is desirable to reduce unnecessary hypotheses and the number of parameters needed to estimate AF shape, as in common scientific modeling studies (McFadden, 2021). Patterson et al. (2003) estimated the cGC filters from a large database of NN threshold data covering center frequencies from 0.25 to 6.0 kHz and SPLs from 30 to 80 dB. They performed a “global fit” in which a set of filter parameters was estimated by minimizing the error over all frequencies and levels simultaneously. The resultant cGC filter required only six coefficients to account for all of the data, where six was the smallest number up to that point. The number of coefficients was obviously smaller than when fitting at individual signal frequencies. Furthermore, the global fit is essential for the construction of the auditory filterbank for signal processing, which requires many filters located between signal frequencies for measurement. Although a similar global fit was also performed with the roex filter (Baker & Rosen, 2006), Unoki et al. (2006) showed that the number of the AF filter coefficients in the cGC filter is smaller than in the roex filter.
However, there was no constraint concerning AT as in previous studies. Instead, they used an arbitrary parameter which is just a low-level limit on threshold that absorbs the effect of the floor in NN thresholds (Glasberg & Moore, 2000; Rosen et al., 1998). Thus, the cGC filter was estimated by using the PSM of masking including the detector signal-to-noise ratio (SNR) and , which do not specify the AF shape and will be referred to as the “non-filter” parameters in what follows. The best fit was obtained when both and were quadratic functions of frequency, i.e., 3 coefficients for each. Thus, there were 12 coefficients in total, i.e., 6 for the cGC filter, 6 for the nonfilter coefficients. An attempt was made to reduce the number of the AF coefficients (Lyon, 2011), but not nonfilter coefficients.
This result raises two difficult questions: Firstly, how to model the frequency dependence of , if it represents processing beyond the cochlea in the auditory pathway. For example, it would be possible to make two models with different frequency dependencies: One with multiple decision devices, having different sensitivities, distributed along the tonotopic axis; and the other having a single decision device, but the noise level in the system beyond the cochlea is frequency dependent. These models require additional hypotheses that are difficult to examine experimentally. The second question is how to interpret as a part of auditory processing. Although provides a convenient way to absorb fitting error, there is no physiological or psychophysical model of the process at this point in time. Moreover, if and are closely correlated or colinear within the previous PSM fitting process, the fit might well get trapped in a local minimum, affecting the estimation of the AF parameters. Therefore, it is desirable to reduce unnecessary hypotheses and parameters in the AF estimation.
(iv) Nonlinear effects
There are cochlear nonlinearities, and one of the most important in AF estimation is two-tone suppression (Delgutte, 1990; Sachs & Kiang, 1968). Since the effect was also measured psychoacoustically (Houtgast, 1972), signal thresholds measured with simultaneous masking may be affected by suppression. To avoid the suppression effect, the forward masking paradigm has sometimes been used for AF estimation (e.g., Leschke et al., 2022; Moore & Glasberg, 1981; Oxenham & Shera, 2003). The bandwidth of the AF was estimated to be narrower with forward masking than with simultaneous masking. One of the problems with the simultaneous masking paradigm is that the conventional AF estimation method ignores such nonlinear effects. As described above, the arbitrary parameter is a function of frequency but not of level. If we could incorporate a level-dependent nonlinear parameter into the PSM of masking, the nonlinear effects could be mitigated to some extent.
This study tried to resolve these challenges by incorporating AT and a level-dependent internal noise into the PSM of masking. The first section presents a NN experiment performed with a large group of normal hearing (NH) listeners and a high proportion of wide notch conditions and low NN levels, to produce a detailed record of how AT interacts with NN threshold. The next section shows how the PSM of masking can be extended to include the noise floor associated with AT and the level-dependent internal noise associated with NN. There are three nonfilter parameters, each with a single constant coefficient, which reduces the number of coefficients compared to the conventional PSM. Then, a quantitative comparison is made between the conventional and extended PSMs for AF shape estimation. In the final section, we confirm that the frequency distribution of the noise floor in the extended PSM is associated with the average hearing level (HL) of NH listeners, and demonstrate that the detector SNR can be frequency independent. The results suggest that the proposed PSM of masking is one of the solutions to these challenges.
Experiment
The NN experiment in this study is similar to those in the previous AF studies (e.g., Baker & Rosen, 2002; Glasberg & Moore, 2000; see Moore, 2012, for details). NN threshold for a sinusoidal signal (0.5, 1.0, 2.0 or 4.0 kHz) was measured in the presence of a NN masker using an adaptive, two-alternative, forced-choice procedure (Levitt, 1971). The main difference in the design of the current experiment was the inclusion of masker conditions with low spectrum levels, where AT is observed to limit the descent of NN threshold in wide notches.
Notched-noise Conditions
The signal frequencies () were 0.5, 1.0, 2.0 and 4.0 kHz. The normalized frequency distances from the signal to the nearer edges of the lower and upper noise bands were . The same notches were used at each spectrum level. The widths of both the lower and upper noise bands were 0.4 of the normalized signal frequency. The spectrum levels () were dB when kHz and dB when was 0.5, 1.0, or 4.0 kHz. At each signal frequency, threshold was measured for the six noise spectrum levels in a random order. Note that the experiments at 2 kHz were performed first and the conditions with the lowest masker levels (10 or 12 dB) were measured separately.
Listeners
The experiment was performed with NH listeners. In total, 26 NH listeners participated in the experiment; they ranged from 19 to 28 years old; there were 14 males and 12 females. They all had HLs less than 20 dB between 125 and 8000 Hz. For each signal frequency, we recruited 8 participants to ensure a reliable average for NH listeners. Each participant completed all of the NN conditions at one signal frequency. No participant completed the experiment for all four signal frequencies. However, one man and one woman participated at two signal frequencies, and two different men participated at three signal frequencies. The experiment was approved by the local ethics committee of Wakayama University and all of the participants provided informed consent before participating in the experiment.
Signal Generation and Measurement Procedure
The sinusoidal signals and the NN maskers were generated digitally at a sampling rate of 48 kHz with 24-bit resolution using MATLAB 2017a on a Mac mini with MacOS 10.12. The signal and the masking noise had the same, 200-ms, duration. The onset and offset were rounded with the rise and fall of a 10-ms hanning window. The stimuli were presented to the listener’s left ear over headphones (OPPO, PM-1) via a headphone amplifier with a USB interface (OPPO, HA-1) at a 48-kHz sampling rate and 24-bit resolution. The listeners were seated in a sound attenuated room (RION, AT62W). The headphone levels were calibrated with an artificial ear (Brüel & Kjær, Type 4153), a microphone (Brüel & Kjær, Type 4192), and a sound level meter (Brüel & Kjær, Type 2250-L).
Signal threshold was measured using a two-interval, two-alternative, forced-choice procedure and the transformed 3-down and 1-up method of Levitt (1971). In one interval, the masker was presented on its own; in the other, the signal and masker were presented simultaneously. Listeners were asked to select the interval containing the signal using a graphical user interface. Feedback regarding the correct answer was indicated visually after listener’s response. There was a brief training session lasting about 20 minutes to familiarize the listener with the threshold procedure.
Results
For the four signal frequencies, Figure 1 shows average NN threshold for the eight listeners at the six masker levels (solid lines), along with their average AT (dashed line). The thresholds associated with the two highest noise levels, 30 and 40 dB remain well above AT out to the widest notches. At lower noise levels (20, 10, and 0), however, threshold is limited by the proximity of AT, and NN threshold at the 10 dB noise level converges onto AT at the wider notch widths. Note that the noise levels at 2 kHz are 2 dB less than those at the other frequencies. The set of curves shows that NN threshold does eventually converge onto AT at all signal frequencies. This in turn suggests that NN threshold should be assumed to converge onto AT in the PSM of masking.
Average NN threshold (solid lines) for eight listeners, and their average AT (dashed line). The signal frequencies () were 0.5 kHz (a), 1.0 kHz (b), 2.0 kHz (c), and 4.0 kHz (d). The abscissa is normalized notch width . The circles () show symmetric notch conditions; the right-pointing triangles (), at , show conditions with additional shifting of the upper noise band by 0.2; the left-pointing triangles (), at , show conditions with additional shifting of the lower noise band by 0.2. The parameter beneath each curve is noise spectrum level which was the same for the lower and upper bands throughout the experiment. The noise levels for the triangles are the same as for the threshold curves just above them. (a) kHz; (b) kHz; (c) kHz; (d) kHz. NN = notched-noise; AT = absolute threshold.
In earlier studies, although AT was routinely measured, it was not included in the data set used to derive the shape and gain of the AF, nor was AT directly represented in the power-spectrum model used to derive filter shape and filter gain. This is the motivation for the new procedure described in the next section.
Extension of the Power Spectrum
Model of Masking
In AF shape research with NN, the PSM of masking is used to estimate signal threshold, (on a dB scale), with the following equations:where is the SNR at the output of the AF and (on a linear scale) is an estimate of the external noise that passes through the AF. Note that, to avoid confusion, the parameters with a prime (e.g.,, ) represent level on a dB scale hereafter. is the spectrum level of the noise, i.e., a power density function, and is the power weighting function of the AF. and are the cochlear frequency range. In the current simulation, Hz and kHz because the gains of the cGC filters between 0.5 kHz and 4 kHz are small beyond this range. Another reason is to avoid ill-defined noise levels at low and high frequencies affecting estimation of the noise floor at the reference frequency, 1 kHz. , , , and specify lower and upper noise bands of NN. In Equation 3, is the transfer function of sound from the acoustic environment to the input of the cochlea. where is the transfer function from the audio device to the ear drum, as explained in Appendix A. is the transfer function of the middle ear as shown in Figure 2b (Aibara et al., 2001; Glasberg & Moore, 2006; Puria et al., 1997).
(a) Relationship between self-generated noise (Buss et al., 2016) and the HL-0dB function (ANSI_S3.6-2010, 2010) at the ear drum (dashed lines), and at the input to the cochlea (solid lines). (b) The middle ear transfer function, (Glasberg & Moore, 2006) for compensating between them.
When the AF is modeled with the cGC, , as described in Appendix B (see also Irino & Patterson, 2001; Patterson et al., 2003), the filter weighting function, becomes . The cGC filter was used because it provides a better representation of the level-dependence and compression of the AF than the conventional roex filter (Patterson & Nimmo-Smith, 1980). Moreover, the cGC filter requires fewer parameters than the roex filter (Unoki et al., 2006).
Incorporating Absolute Threshold into the Estimation of NN Threshold
In conventional NN experiments, the NN level is well above AT, so AT can be ignored in the derivation of AF shape. The PSM was extended to include AT by assuming that there is a noise floor that limits NN threshold, and the power at the output of the corresponding AF iswhere is the spectrum level of the noise floor in the cochlea. NN threshold depends on both the noise floor power in Equation 4 and the external noise power in Equation 2. So, NN threshold isAT, , can be estimated when is equal to which is the floor level in quiet, i.e., . Then AT can be estimated with the PSM asWe assumed that any noise from sources beyond the cochlea that affect AT can be associated with .
Frequency Distribution of the Noise Floor
We would like to know whether AT is completely determined in quiet by the noise floor, , or whether some other factor is also involved. Buss et al. (2016) reported that the distribution of internal “self-generated noise” of NH listeners is similar to the 0-dB HL function (ANSI_S3.6-2010, 2010) on a dB scale as shown in Figure 2(a). If we assume that the distribution of the noise floor, , is indeed the 0-dB HL function, , it can be formulated as in Equation 15 in Appendix A. We used for filter estimation in this section and examined whether this assumption is reasonable in a later section.
Level Dependence of the Noise in the Cochlea
Irino et al. (2018) assumed that in Equation 4 would be dependent on the level of the external NN, in which case, should be greater than in Equation 15. This is because distortion products would be generated by cochlear nonlinearity (e.g. Gaskill & Brown, 1990; Hall, 1972) and its distribution could spread widely even beyond the frequency regions of the NN. They found that the estimation error was significantly reduced when the noise level was dependent on the NN level. In this paper, the noise level on a dB scale was defined aswhere is the noise floor in quiet, i.e., when there is no external sound. Equation 8 means that the noise level increases from its quiet level as the external noise level, , increases. The proportionality parameter for the level dependence is in dB/dB. This equation is a revised version of Equation 9 in Irino et al. (2018) with one less parameter, which reduces the difficulty of selecting initial values in filter estimation. The noise level becomes a value which is linearly interpolated with the ratio of between and . This model will be referred to as the “ model” in what follows. When , it becomes a level-independent, fixed function, , which will be referred to as the “ model.”
We assume this simple formula with frequency-independent is sufficient to support a first order approximation of the level dependence. There may be several aspects of the noise level which need to be considered for accurate simulation. For example, the external noise (NN) is a pair of bandpass noises with width . Although its power is constant on a logarithmic frequency axis, it may be affected by the location of the NN noise bands on the axis. However, we do not know the exact characteristics of the level-dependent internal noise currently and it would be difficult to take all of the factors into account.
Filter Estimation in the and Models
The coefficients of the AF were estimated using a least-squares method (Moré, 1978) to minimize the error between measured and predicted NN thresholds ( and in Equation 5) and the error between the measured and predicted ATs (, and in Equation 6), that is,where is a vector of the cGC filter coefficients, (see Appendix B), plus the nonfilter parameters . Note that we also introduced several constraints to control the AF shape within a reasonable range by restricting the ranges of the cGC coefficients, the bandwidth, and the slope of the IO function. The constraints were introduced as error terms with small weights.
Filter Estimation in the Conventional Model
In the NN experiment, threshold converges at a low level somewhat above AT even when the NN level is relatively high (Patterson & Nimmo-Smith, 1980) (see Figure 1). Glasberg & Moore (2000) introduced a term, , to represent the lower limit of NN threshold and prevent it from distorting the representation of the tails of the AF. In this case,The coefficients of the AF were estimated using the least-squares method to minimize the error between the measured thresholds, , and the thresholds predicted by the model, ; that iswhere is a vector of the cGC coefficients, , plus the parameters . Glasberg & Moore (2000) showed that the use of is effective in reducing estimation error. They suggested that is related to AT but they did not explain the relationship in detail. This model will be referred to as the “ model” in what follows; it is used as a “conventional model” to compare performance with that of the model.
Estimation Procedure and Results
The , , and models were compared to evaluate the effect of level dependence on the goodness of the filter estimation.
Procedure
The AFs of 500, 1000, 2000, and 4000 Hz were estimated simultaneously using all 144 thresholds ( probe frequencies ) and 4 ATs shown in Figure 1. The estimation method is similar to the global fit with in Patterson et al. (2003). Equations 9 and 12 were extended to accommodate four probe frequencies. They reported that filter shape can be accurately determined using a cGC filter with six, frequency independent coefficients and two nonfilter parameters and which were quadratic functions of frequency. The number of coefficients for each parameter is listed in the bottom part of Table 1 and for the model was 12 in total. Based on this knowledge, we set the number of coefficients for the and models as shown in Table 1. The number of cGC filter coefficients was six. in Equation 15 and in Equation 8 were set to constants. The SNR at the output of the AF, , was set to a constant, a linear function or a quadratic function of the normalized frequency, , defined as is the number of equal rectangular bands (Moore, 2012) and kHz.
Number of coefficients in selected fits of the , , and (Patterson et al., 2003) models with the RMS errors of NN threshold and AT that each model produced.
Number of coefficients
NN error
AT error
model
total
(dB)
(dB)
11
6
3
1
1
-
2.01
2.18
10
6
2
1
1
-
2.02
2.12
9
6
1
1
1
-
1.92
2.56
10
6
3
1
-
-
2.75
2.42
9
6
2
1
-
-
2.74
2.42
8
6
1
1
-
-
2.64
2.74
12
6
3
-
-
3
2.52
4.89
NN = notched-noise; RMS = root mean square; AT = absolute threshold.
In the estimation, each model was fitted to the 144 thresholds 10 times, using different initial values for the cGC coefficients, chosen randomly within a range of the summary coefficient values reported in Patterson et al. (2003). The best of the 10 filter set was selected as the one that minimized the root-mean-squared (RMS) error of the NN threshold.
Results and Discussion
Estimation Error
The right two columns of Table 1 show the RMS estimation errors of NN threshold and AT. The table shows that the NN threshold errors of the model were between 1.92 dB and 2.02 dB, which is approximately 0.7 dB and 0.5 dB smaller than those of the and models, respectively. It is worth noting that the NN error is the smallest when the number of the coefficients for is one, i.e. is a constant. This shows that does not need to be frequency dependent for accurate filter estimation. This is a counter intuitive result because the use of more coefficients did not reduce the error. The fitting process seemed to be trapped by local minima. We return to this issue in a later section. The model also requires fewer coefficients than the model. The AT errors of the model were also smaller than those of the model.
Thus, the introduction of the proportionality parameter for the level dependence, , in Equation 8 reduces the estimation error of NN threshold, which may imply that the noise level is dependent on the external noise level (NN), and the estimation of AF shape should take this into account. With the model, NN threshold error is effectively independent of the form of ; the version with 9 coefficients is as effective as the one with 11 coefficients. In other words a model with a fixed is sufficient to explain the NN threshold data.
This result also suggests that the distribution of the HL-0 dB threshold largely explains the frequency dependence which necessitated the frequency dependent terms associated with the nonfilter parameters and . Indeed, it suggests that the arbitrary coefficient is not required for accurate AF shape estimation.
Filter and Non-filter Coefficients
The filter coefficients of the nine-coefficient model and the twelve-coefficient model are listed in Table 2. The model achieved an NN error of less than 2 dB. The corresponding value for the model with the current NN data is 2.52 dB. For comparison the cGC filter coefficients of the model reported by Patterson et al. (2003) are listed in the bottom row. The reported NN error value is 3.71 dB for the previous NN data set.
Values from the global fit for the nine-coefficient model and the twelve-coefficient model.
is normalized frequency as defined in equation 13. The NN threshold and AT errors are listed in the last two columns. The bottom row shows the values for the model reported by Patterson et al. (2003). The NN error with the asterisk cannot be fairly compared with the two above it because the NN threshold data were from a different experiment. NN = notched-noise; AT = absolute threshold.
The main difference between the two models is the number of nonfilter coefficients. The model largely reduced the nonfilter coefficients to half of those required by the model. This is the lowest number ever achieved in the AF shape estimation. Moreover, the model achieved the minimum error with fewer coefficients.
The proportionality parameter for the level dependence, in Equation 8 was 0.20 dB/dB, which is approximately the same as the minimum slope of the compression function shown in Figure 4(b) and described in e.g., Moore (2012) and Pickles (2013). This result could be interpreted in, at least, two ways: (i) the distortion products (Robles et al., 1991) generated by the NN masker produce an internal noise with an increasingly compressive growth rate; (ii) it is more likely that the suppression effect (Delgutte, 1990; Houtgast, 1972) appears indirectly as a level-dependent internal noise level. In this case, the logic may be follows. The predicted threshold in Equation 5 is calculated from the sum of the internal and external noise levels. Therefore, increases more than linearly as the external noise increases when the AF is fixed. This allows the suppression effect to be incorporated into the AF shape during the fitting process, minimizing the error between the measured and predicted thresholds. To avoid the suppression effect, the forward masking paradigm has sometimes been used for the AF shape estimation (e.g., Leschke et al., 2022; Moore & Glasberg, 1981; Oxenham & Shera, 2003). However, could mitigate such nonlinear effects that occur in the simultaneous NN masking paradigm.
Equivalent rectangular bandwidth (ERB) (a) and IO function (b) of the nine-coefficient model shown in Figure 3(a). The lines in panel (a) correspond to the input levels of 30, 50, 60, 70, and 80 dB. The lines in panel (b) represent the IO functions when the center frequencies are 500, 1000, 2000, and 4000 Hz. The IO function is drawn so that the output level is 100 dB when the cochlear input level is 100 dB. (a) Bandwidth; (b) IO function
Filter Shape
The filter shapes associated with the coefficients listed in Table 2 are shown in Figure 3(a) for the nine-coefficient model and Figure 3(b) for the twelve-coefficient model. The filter shapes at 80 dB (bottom curves) are clearly sharper in the model than in the model. This trend holds at higher levels (upper curves), but to a lesser extent.
Filter shapes using the coefficients listed in Table 2. The center frequencies are 500, 1000, 2000, and 4000 Hz and the five lines at each frequency correspond to input levels every 10 dB between 30 and 80 dB. The filter shapes at 30 dB are shown by the top curves and those at 80 dB by the bottom curves. (a) Nine-coefficient model; (b) Twelve-coefficient model
Bandwidth
Figure 4(a) shows the bandwidth of the nine-coefficient model shown in Figure 3(a). When the level of the input is between 30 and 50 dB, the bandwidth was effectively fixed at approximately 1.5 times of the standard ERB of NH listeners, . Above 60 dB, the bandwidth increased rapidly with level. This is slightly different from the result presented in Patterson et al. (2003) where the bandwidth increased almost linearly between 30 and 70 dB. The rate of increase was independent of the frequency; this is consistent with the result in Patterson et al. (2003).
IO Function
Figure 4(b) shows the IO function for the nine-coefficient model shown in Figure 3(a). The slope of the IO function decreases as the center frequency increases. The minimum slope was 0.43 dB/dB at 500 Hz, 0.30 dB/dB at 1000 Hz , 0.22 dB/dB at 2000 Hz, and 0.18 dB/dB at 4000 Hz. The IO slopes are roughly consistent with those in Patterson et al. (2003).
Frequency Distribution of the Noise Floor Effective for Filter Estimation
In the previous section, it was demonstrated that AF estimation was successful when the noise floor in quiet was defined as in Equation 15. It was assumed that could be obtained directly from the HL-0dB function, which in turn means that the frequency dependence of AT is entirely determined by the noise floor. We also examined whether a constant is really better for filter estimation, as implied in Table 1.
To investigate these issues, we introduced a constant exponent, , into Equation 15 to control the spectral contrast of the distribution as in Equation 16 of Appendix A. At , is equivalent to the original distribution, . As the value deviates from 1, the range of the spectral distribution is enhanced or reduced.
The fitting procedure involving was similar to that described in the previous section. The AF was estimated for values between 0 and 1.6 in steps of 0.1. The fit was performed 10 times with different initial coefficients, and the best filter was selected as the one that minimized the RMS error of NN threshold. Moreover, we compared the models when was a constant, a linear function of frequency, or a quadratic function of frequency.
Results and Discussion
Figure 5(a) shows the estimation error of NN threshold as a function of the exponent of in Equation 16. The lines with circles (o), pluses (+), and diamonds show the results from the model with constant, linear, and quadratic functions of . These lines were approximately parabolic and had minimum values of 1.89 dB, 2.01 dB, and 2.00 dB when was 1.1, 1.1, and 0.9, respectively. Interestingly, the error in NN threshold was smaller in the model with a constant than in the models with a linear or a parabolic function of , when . Moreover, if is a constant, the parameter in the model can be properly estimated, rather than being trapped in a local minimum, with the result that the AF estimation process becomes more stable. The results support the conclusion that the detector SNR, , is a constant and does not vary across frequency at a stage beyond beyond AF output. Moreover, the estimation was best when , independent of the order of . As Equation 16 with is equivalent to Equation 15, this supports the assumption that AT is largely determined by the noise floor function shown in Equation 15.
Estimation errors (dB) of NN threshold (a) and AT (b) as a function of the exponent in Equation 16. The errors of the model are shown for constant (o), linear (+), and quadratic functions () of . The errors of the model at , which are listed in Table 1, are also shown for constant (□), linear (*), and quadratic functions () of . (a) NN threshold error; (b) AT error. NN = notched-noise; AT = absolute threshold.
Figure 5(b) shows AT error as a function of the exponent . The error of the model when is a constant (line with circle) is approximately 2.5 dB. This value is greater than that of the model when is either a linear or a quadratic function of frequency, but the improvement is limited to 0.5 dB at maximum. The effect of the AT error on AF shape was much smaller than that of NN threshold because the number of the NN thresholds was 144 while the number of ATs was 4.
Conclusions
This paper shows how the procedure for AF shape estimation can be improved by incorporating AT and a level-dependent internal noise into the PSM of NN masking. This extended PSM unified the estimation of AF shape and AT within a simple framework. The AF shape over a wide range of frequencies and levels was estimated with only nine coefficients, reducing the number of nonfilter coefficients to half the number required in the model. This is the lowest number ever achieved in the AF shape estimation and, at the same time, the overall estimation error was smaller with this more compact model. More importantly, the results also suggest that (i) The detector SNR, , could be assumed to be constant across signal frequencies, which means that no additional hypothesis is required in the postfiltering process. (ii) The spectral shape of the noise floor was well approximated by the ANSI standard “Hearing Level-0dB” function for NH listeners, which may imply that AT of NH could be modeled by the noise floor in the cochlea. (iii) The proportionality parameter for the level-dependent internal noise level was compressive (0.20 dB/dB) and could mitigate the nonlinear effects that occur in the simultaneous NN masking paradigm. In practical terms, the new PSM has expanded the range of the technique, particularly when the SPL of the NN threshold is close to AT.
Footnotes
Acknowledgments
The authors wish to thank Toshie Matsui,Hiroki Matsuura and Anzu Nakama for assisting in the data collection. The authors also wish to thank the editor in chief,Andrew Oxenham,the associate editor,Enrique Lopez-Poveda,and two anonymous reviewers for helpful comments to improve the manuscript.
Declaration of Conflicting Interests
The authors declare no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work was supported by JSPS KAKENHI Grant Numbers JP16H01734,JP21H03468,and JP21K19794.
ORCID iD
Toshio Irino
References
1.
AibaraR.WelshJ. T.PuriaS.GoodeR. L. (2001). Human middle-ear sound transfer function and cochlear input impedance. Hearing Research, 152(1-2), 100–109.
2.
ANSI_S36-2010. (2010). Specification for audiometers. (American National Standards Institute, New York, USA, 2010).
3.
BakerR. J.RosenS. (2002). Auditory filter nonlinearity in mild/moderate hearing impairment. The Journal of the Acoustical Society of America, 111(3), 1330–1339.
4.
BakerR. J.RosenS. (2006). Auditory filter nonlinearity across frequency using simultaneous notched-noise masking. The Journal of the Acoustical Society of America, 119(1), 454–462.
5.
BussE.PorterH. L.LeiboldL. J.GroseJ. H.Hall IIIJ. W. (2016). Effects of self-generated noise on estimates of detection threshold in quiet for school-age children and adults. Ear and Hearing, 37(6), 650.
6.
CarneyL. H.McDuffyM. J.ShekhterI. (1999). Frequency glides in the impulse responses of auditory-nerve fibers. The Journal of the Acoustical Society of America, 105(4), 2384–2391.
7.
CarneyL. H.YinT. C. (1988). Temporal coding of resonances by low-frequency auditory nerve fibers: Single-fiber responses and a population model. Journal of Neurophysiology, 60(5), 1653–1677.
8.
de BoerE.de JonghH. R. (1978). On cochlear encoding: Potentialities and limitations of the reverse-correlation technique. The Journal of the Acoustical Society of America, 63(1), 115–135.
9.
DelgutteB. (1990). Two-tone rate suppression in auditory-nerve fibers: Dependence on suppressor frequency and level. Hearing Research, 49(1-3), 225–246.
10.
Eustaquio-MartínALopez-PovedaE. A. (2011). Isoresponse versus isoinput estimates of cochlear filter tuning. Journal of the Association for Research in Otolaryngology, 12, 281–299.
11.
FletcherH. (1940). Auditory patterns. Reviews of Modern Physics, 12(1), 47.
12.
GaskillS. A.BrownA. M. (1990). The behavior of the acoustic distortion product, 2 f 1- f 2, from the human ear and its relation to auditory sensitivity. The Journal of the Acoustical Society of America, 88(2), 821–839.
13.
Glasberg, B. R., & Moore, B. C. J. (1986). Auditory filter shapes in subjects with unilateral and bilateral cochlear impairments. The Journal of the Acoustical Society of America, 79(4), 1020–1033.
14.
GlasbergB. R.MooreB. C. J. (1990). Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47(1-2), 103–138.
15.
GlasbergB. R.MooreB. C. J. (2000). Frequency selectivity as a function of level and frequency measured with uniformly exciting notched noise. The Journal of the Acoustical Society of America, 108(5), 2318–2328.
16.
GlasbergB. R.MooreB. C. J. (2006). Prediction of absolute thresholds and equal-loudness contours using a modified loudness model. The Journal of the Acoustical Society of America, 120(2), 585–588.
17.
HallJ. L. (1972). Auditory distortion products f2-f1 and 2f1-f2. The Journal of the Acoustical Society of America, 51(6B), 1863–1871.
18.
HoutgastT. (1972). Psychophysical evidence for lateral inhibition in hearing. The Journal of the Acoustical Society of America, 51(6B), 1885–1894.
19.
HoutgastT. (1973). Psychophysical experiments on “tuning curves” and “two-tone inhibition”. Acta Acustica United with Acustica, 29(3), 168–179.
20.
IrinoTPattersonR. D. (1997). A time-domain, level-dependent auditory filter: The gammachirp. The Journal of the Acoustical Society of America, 101(1), 412–419.
21.
IrinoT.PattersonR. D. (2001). A compressive gammachirp auditory filter for both physiological and psychophysical data. The Journal of the Acoustical Society of America, 109(5), 2008–2022.
22.
IrinoT.YokotaK.MatsuiT.PattersonR. D. (2018). Auditory filter derivation at low levels where masked threshold interacts with absolute threshold. Acta Acustica United with Acustica, 104(5), 887–890.
23.
IrinoT.YokotaK.PattersonR. D. (2022). Improving auditory filter estimation with level-dependent cochlear noise floor. In: Proc. International Symposium on Hearing (ISH) 2022. Zenodo preprint, (pp. 1–17). https://doi.org/10.5281/zenodo.6576893
24.
Johnson-DaviesD.PattersonR. D. (1979). Psychophysical tuning curves: Restricting the listening band to the signal region. The Journal of the Acoustical Society of America, 65(3), 765–770.
25.
Kidd JrG.FethL. L. (1981). Patterns of residual masking. Hearing Research, 5(1), 49–67.
26.
LeschkeJ.OrellanaG. R.SheraC. A.OxenhamA. J. (2022). Auditory filter shapes derived from forward and simultaneous masking at low frequencies: Implications for human cochlear tuning. Hearing Research, 420, 108500.
27.
LevittH. (1971). Transformed up-down methods in psychoacoustics. The Journal of the Acoustical Society of America, 49(2B), 467–477.
28.
LyonR. F. (2011). Cascades of two-pole–two-zero asymmetric resonators are good models of peripheral auditory function. The Journal of the Acoustical Society of America, 130(6), 3893–3904.
29.
McFaddenJ. (2021). Life is simple: How Occam’s razor set science free and unlocked the universe. Hachette,
30.
MooreB. C. J. (2012). An introduction to the psychology of hearing. Brill.
31.
MooreB. C. J.GlasbergB. R. (1981). Auditory filter shapes derived in simultaneous and forward masking. The Journal of the Acoustical Society of America, 70(4), 1003–1014.
32.
MoréJ. J. (1978). The Levenberg-Marquardt algorithm: implementation and theory. In: Numerical analysis. (pp. 105–116). Springer.
33.
NitschmannM.VerheyJ. L.KollmeierB. (2010). Monaural and binaural frequency selectivity in hearing-impaired subjects. International Journal of Audiology, 49(5), 357–367.
34.
OxenhamA. J.SheraC. A. (2003). Estimates of human cochlear tuning at low levels using forward and simultaneous masking. Journal of the Association for Research in Otolaryngology, 4, 541–554.
35.
PattersonR. D. (1976). Auditory filter shapes derived with noise stimuli. The Journal of the Acoustical Society of America, 59(3), 640–654.
36.
PattersonR. D.AllerhandM. H.GiguereC. (1995). Time-domain modeling of peripheral auditory processing: A modular architecture and a software platform. The Journal of the Acoustical Society of America, 98(4), 1890–1894.
37.
PattersonR. D.MooreB. C. J. (1986). Auditory filters and excitation patterns as representations of frequency resolution. In Frequency selectivity in hearing (pp. 123–177). Academic.
38.
PattersonR. D.Nimmo-SmithI. (1980). Off-frequency listening and auditory-filter asymmetry. The Journal of the Acoustical Society of America, 67(1), 229–245.
39.
PattersonR. D.UnokiM.IrinoT. (2003). Extending the domain of center frequencies for the compressive gammachirp auditory filter. The Journal of the Acoustical Society of America, 114(3), 1529–1542.
40.
PicklesJ. (2013). An introduction to the physiology of hearing. Brill.
41.
PuriaS.PeakeW. T.RosowskiJ. J. (1997). Sound-pressure measurements in the cochlear vestibule of human-cadaver ears. The Journal of the Acoustical Society of America, 101(5), 2754–2770.
42.
RhodeW. S. (1971). Observations of the vibration of the basilar membrane in squirrel monkeys using the mössbauer technique. The Journal of the Acoustical Society of America, 49(4B), 1218–1231.
43.
RoblesL.RuggeroM. A.RichN. C. (1991). Two-tone distortion in the basilar membrane of the cochlea. Nature, 349(6308), 413.
44.
RosenS.BakerR. J.DarlingA. (1998). Auditory filter nonlinearity at 2 khz in normal hearing listeners. The Journal of the Acoustical Society of America, 103(5), 2539–2550.
45.
SachsM. B.KiangN. Y. S. (1968). Two-tone inhibition in auditory-nerve fibers. The Journal of the Acoustical Society of America, 43(5), 1120–1128.
46.
UnokiM.IrinoT.GlasbergB.MooreB. C. J.PattersonR. D. (2006). Comparison of the roex and gammachirp filters as representations of the auditory filter. The Journal of the Acoustical Society of America, 120(3), 1474–1492.
47.
von BékésyG. (1960). Experiments in hearing (Translated and editted by E. G. Wever). Acoustical Society of America.