Abstract
Keywords
Introduction
Bone conduction (BC) is an important pathway for sound perception. With BC, sound energy is transmitted by vibration of the skull bones to the cochlea, causing sound perception (Stenfelt, 2011). Although BC and air conduction (AC) have different transmission paths, they both ultimately produce auditory perception by stimulating the cochlea (Stenfelt, 2007; Wever & Lawrence, 1954).
Due to the small interaural level differences (ILDs) and interaural time differences (ITDs) associated with BC transmission, BC is usually considered to be a less effective means of transmitting spatial information than AC (Stenfelt & Zeitooni, 2013; Zeitooni et al., 2016). The maximum ITD via BC is about 0.2 ms while AC ITDs range up to about 0.65 ms (McBride et al., 2015). The limited ILDs and ITDs for BC make it difficult for listeners to isolate the BC stimulation presented to either side of the skull (Agterberg et al., 2011). Consistent with this, Stenfelt and Zeitooni (2013) found that the binaural benefit for BC stimulation in terms of spatial release from masking was approximately half that for AC stimulation. However, unmasking and localization are different aspects of spatial perception, and it remains possible that sound localization through BC is relatively preserved.
A few studies have investigated auditory spatial perception for binaural virtual sound reproduction using conventional AC headphones and BC devices (BCDs). The results of some studies show similar localization performance in the horizontal plane for the two types of transducers. For instance, MacDonald et al. (2006) used individualized head-related transfer functions (HRTFs) and measured how well normal-hearing participants could determine the source azimuth of a train of noise bursts. The results were comparable for AC and BC stimulation. BC stimulation was at the condyle and all stimuli were bandpass-filtered (0.3–5 kHz) in this study. McBride et al. (2015) studied the effect of BC stimulation position on spatial acoustic perception tasks. They measured localization accuracy for AC headphones and BCDs for three different stimulation positions (in front of, above, and behind the participant’s ears) and found comparable localization accuracy for BC transducers placed in front of or above the participant's ears as for headphones. However, other studies have shown worse localization with BC than with AC stimulation. Schonstein et al. (2008) reported localization performance for in-ear headphones, circumaural headphones, and BC headphones using non-individualized HRTFs. They showed that in-ear headphones gave the best localization while BC headphones gave the poorest localization. Lindeman et al. (2008) also reported that localization accuracy with bilateral BC stimulation was slightly worse than that with AC stimulation.
Although the above studies have analyzed the feasibility of virtual sound reproduction via BCDs, it has not yet been sufficiently studied how different BC stimulation positions affect sound localization. The positions used in previous studies are mostly those used in clinical or commercial electronic devices, such as the mastoid or condyle, while other positions, like the temple or supra-auricular, have not been tested. It is therefore worthwhile to evaluate other potential BC stimulation positions for spatial sound reproduction. Also, most previous studies used broadband stimuli covering the whole frequency response ranges of the BCDs. The possible effects of stimulation frequency band on sound reproduction and spatial perception have rarely been studied.
Many studies have shown that the effectiveness of BC stimulation depends on the position of the transducer. The closer to the cochlea, the higher is the effectiveness of BC stimulation. Stenfelt and Goode (2005b) measured promontory motion on cadaver heads for 29 stimulation positions. Their results showed decreased responses with increasing distance between the cochlea and the stimulation position. Using BC hearing threshold measurement, Studebaker (1962) compared three stimulation positions, forehead, mastoid, and top of the head, and McBride et al. (2008) compared eleven positions on the surface of the head. These studies demonstrated that BC sensitivity depends on stimulation position and stimulation frequency range (Eeg-Olofsson et al., 2011; McBride et al., 2008; Stenfelt & Goode, 2005b). Whether BC localization performance depends on stimulation position and stimulation frequency band needs to be further studied.
The transcranial attenuation (TA) of BC sound is defined as the difference in BC hearing sensitivity between ipsilateral and contralateral stimulation, where the stimulation position is the same for the two sides of the skull (Röösli et al., 2021). Two main methods have been used to measure TA. One is based on the differences in pure-tone hearing thresholds with ipsilateral and contralateral stimulation via BC (usually applied at the mastoid) for participants with unilateral hearing loss (Nolan & Lyon, 1981; Snapp et al., 2016; Stenfelt, 2012). The other method is by vibration measurements of the cochlea (Eeg-Olofsson et al., 2011; Stenfelt & Goode, 2005b). Using hearing threshold measurements, many researchers reported that TA is about 10 to 15 dB in the frequency range between 0.25 and 4 kHz (Nolan & Lyon, 1981; Vanniasegaram et al., 1994). In experiments measuring cochlear promontory vibration in cadaveric heads, similar results to those for threshold-based have been reported (Eeg-Olofsson et al., 2011; Stenfelt & Goode, 2005b). Also, it is generally accepted that TA is frequency-dependent, being greater at high frequencies than at low frequencies (Mattingly et al., 2020). Such limited TA might corrupt acoustic spatial information, affecting auditory spatial perception and thus the ability to localize sound sources by BC device wearers.
The aim of this study was to investigate the effect of different stimulation positions and frequency bands on BC localization. This aim was accomplished via two experiments. Experiment 1 assessed localization accuracy for 13 locations in the frontal horizontal plane to investigate the effect of stimulation frequency band and position. Experiment 2 assessed the effect of stimulation position on front-back confusion of BC virtual sound sources for 12 locations in the entire horizontal plane. To analyze the relationship between BC localization and TA, TA was also measured for different stimulation positions.
The vibration of the housing of the BC transducers leads to radiation of AC sound into the external ear canal, which may affect BC localization. To ensure that the spatial sound perception by BC stimulation is purely a result of the BC path, it is critical to reduce the effect of the radiated sound in BC localization tests. Several approaches have been used in previous research to reduce the effect of the radiated sound. One way is to use foam earplugs or earmuffs (Matos et al., 2010). Another way is to play noise from a loudspeaker to mask the radiated sound (MacDonald et al., 2006). However, masking noise might interfere with virtual BC sound localization. To avoid the influence of the AC radiated sound, experiments 1 and 2 were conducted with the ear canals occluded with foam earplugs. Preliminary experiments were conducted to assess the magnitude of the radiated sound and to measure the occlusion effect produced by occluding the ear canals.
Materials and Methods
Participants
Nine normal-hearing participants (7 males, 2 females; age range: 23–27 years) participated in the experiments. The participants all had pure-tone hearing thresholds below 20 dB hearing level (HL) for audiometric frequencies from 0.25 to 8 kHz, reported no history of otologic pathology, and had an across-ear difference in hearing level not exceeding 10 dB at each test frequency.
Apparatus
The experiments were conducted in a sound-insulated test room with dimensions 6.78 m × 3.51 m × 2.26 m (L × W × H). The reverberation time (T60) was about 250 ms and the background noise was 21 dB sound pressure level (SPL). The stimuli were generated by a computer equipped with a sound card (Fireface UFX II, RME, Haimhausen, Germany). The output was routed to BC transducers (B-81, RadioEar, Middelfart, Denmark) for BC stimulation and AC headphones (IE 800, Sennheiser, Wedemark, Germany) for AC stimulation. Participants wore BC transducers on both sides of the head. As shown in Figure 1, five BC stimulation positions were used: (i) mastoid, (ii) condyle, (iii) supra-auricular, (iv) temple, and (v) bone-anchored hearing aid (BAHA) implant position, which is approximately 55 mm posterior to the ear canal opening in line with the upper part of the pinna. A probe microphone (ER-7C, Etymotic Research, Elk Grove Village, USA) and foam earplugs (EARTM ClassicTM, 3M, Minnesota, USA) were used to measure and suppress radiated airborne sound.

The five BC stimulation positions: (pos. i) mastoid, (pos. ii) condyle, (pos. iii) supra-auricular, (pos. iv) temple, (pos. v) BAHA position, which is approximately 55 mm posterior to the ear canal opening in line with the upper part of the pinna.
Stimuli
The test stimuli were a sequence of eight 250-ms Gaussian noise bursts separated by 300-ms intervals. All stimuli were cosine windowed with a rise and fall time of 40 ms to reduce the use of onset cues for sound source localization (Moore, 2003). To obtain spatial sound, the stimuli were filtered by non-individualized HRTFs corresponding to the desired spatial locations. The HRTFs were chosen from the MIT database (Gardner & Martin, 1995) with an azimuthal interval of
In the first experiment, 13 locations spaced 15° apart in the horizontal plane with azimuths
In the second experiment, 12 locations spaced 30° apart in the horizontal plane with azimuths
Experimental Procedure
Measurement of Radiated Airborne Sound
The following procedure was used to measure the magnitude of the sound radiated by the vibration of the cases of the BC transducers. Loudness was balanced between AC and BC stimulation to obtain equivalent loudness for BC stimulation before the measurement of radiated sound. AC and BC stimuli were presented unilaterally. Stimuli were 1 s long, one-third-octave bands of digitally-generated white noise with center frequencies of 0.5, 1, 2, 3, 4, and 6 kHz. Stimuli were initially presented through a headphone at a level of 60 dB SPL. The sound pressure level of each stimulus through the headphone was calibrated with the built-in microphone of a dummy head (KU100, Neumann, Berlin, Germany). Each noise stimulus was then presented in alternation with the BC transducer or the AC headphone. EARTM ClassicTM foam earplugs were always fully inserted in the ear canal of the participants during the loudness calibration. The BC transducer was held at the ipsilateral mastoid of the test ear with an elastic band, and the band was adjusted to give a static pressure of 3 ± 0.5 N. The participant adjusted the signal applied to the BC transducer to match the perceived loudness of the alternating AC stimuli at 60 dB SPL (Pollard et al., 2013; Qin & Usagawa, 2017). This was repeated for each center frequency. The levels of the BC stimuli obtained during the loudness matches were used for the subsequent measurements.
Then the radiated airborne sound produced by the BC transducer was measured for the same six center frequencies, using the same narrowband noises. Foam earplugs were fully inserted into the ear canals (with the outer end of the earplug inside the tragus) under the supervision of the experimenter to prevent the radiated sound from reaching the tympanic membrane. An ER-7C probe microphone was held at the ear canal entrance to measure the radiated sound. The radiated sound was also measured with the BC transducer applied at the other four stimulation positions.
Measurement of the Occlusion Effect
The foam earplugs that were used to eliminate the radiation of airborne sound into the ear canal might introduce an occlusion effect, which is defined as an increase in perceived sound with occlusion of the ear canal opening when the stimulus is presented through BC (Stenfelt & Reinfeldt, 2007).
Two main methods have been used to measure the occlusion effect: (i) by measuring the difference in hearing threshold with the ear canal open and occluded (Small & Stapells, 2003); and (ii) by measuring the ear-canal sound pressure (ECSP) with the ear canal open and occluded (Stenfelt et al., 2003). However, there is evidence that the occlusion effect is overestimated using the ECSP method (Reinfeldt et al., 2013). Therefore, the hearing threshold method was used to measure the occlusion effect in the present study.
Two main kinds of methods have been used for measuring hearing thresholds in clinical situations, namely the ascending and bracketing methods (ISO 8253-1: 2010, 2010). In this study, the bracketing method and a
The occlusion effect was measured for each participant prior to the localization experiments. Due to the similarity of BC hearing thresholds across the two ears, measurements were taken only for each participant's left ear, to reduce measurement time.
Measurement of Transcranial Attenuation
The TA for each participant was measured using the BC hearing threshold method. BC hearing thresholds were measured when the BC transducer was placed at the ipsilateral and contralateral mastoid of the test ear, respectively. The TA was calculated as the BC hearing threshold with the stimulation at the contralateral mastoid minus the BC hearing threshold with the stimulation at the ipsilateral mastoid. Pink noise was used to mask the non-test ear, as described above. The TAs for the other four positions were measured in the same way. To interpret the effects of occlusion, the TA was also measured with the foam earplug occluding the ear canal.
Front Horizontal Plane Localization
Virtual sound synthesized using HRTFs was presented through AC headphones or BC transducers. To ensure loudness equalization, the loudness-matching procedure described earlier (Pollard et al., 2013) was used for the 0° direction, so as to equate the loudness of the sounds evoked by the BC transducers and AC headphones.
The participants were comfortably seated in a chair in the sound-insulated room. They were asked to look straight ahead and concentrate on the virtual sound location during each trial. The experimenter controlled the computer audio and the participants verbally reported the perceived azimuth of the sound source to the experimenter from a selection of 13 locations in the frontal horizontal plane.
The experiment was divided into six test blocks according to the transducer type and stimulation position. During the first test block, localization accuracy with AC headphones was tested; in the other five blocks, accuracy was measured with BC transducers at the five stimulation positions. The test order of the five BC positions was randomized across participants. In each block, the three types of test signals, namely, low-frequency, high-frequency, and broadband, were presented in a random order across participants. Within each block, each virtual location was randomly presented three times (39 signal presentations per signal type and participant). For each stimulation modality (including AC stimulation and BC stimulation at five positions), as well as for all three types of stimuli, the virtual sound sources were presented in a sequence clockwise from left (−90°) to right (90°) and counterclockwise from right (90°) to left (−90°) once, to allow participants to get familiar with the spatial characteristics of the sounds. Then, the formal experiment began. In each trial, the participant indicated the perceived sound location and the experimenter recorded the response. The test then proceeded to the next trial. Breaks were taken as needed between trials. Throughout the experiment, no feedback was given. The experiment took 45 minutes for each block and 4.5 hours for all blocks for each participant.
Measurement of Front-Back Confusion
Experiment 2 investigated the effect of BC stimulation position on front-back confusions (FBCs). The test procedure was the same as for the first experiment except that 12 virtual locations spaced 30° apart across the full 360° horizontal plane were used and only broadband stimuli were used.
Data Analysis
Localization Error
The best linear fit for the target-response azimuth relationship and the mean absolute error (MAE; Snapp et al., 2020) were used to assess localization accuracy for each participant and stimulation condition. The best linear fit for the target-response azimuth relationship was calculated as:
The MAE between the response azimuth and the target azimuth was computed as:
Confusion Rate and Confusion Score
FBC is defined as the perception of the sound image in the front hemisphere when the stimulation is in the mirror direction of the back hemisphere, or vice versa (Xie, 2013). The FBC rate (
The stimulus virtual location is denoted
The farther away a response
The
Results
Radiated Airborne Sound
The SPLs of the radiated sound for three participants and their mean value are shown in Figure 2(a). Except for one participant at 6 kHz, the radiated sounds all had levels above 35 dB SPL. The mean radiated sound level was minimum at 0.5 and 6 kHz (about 40 dB SPL), and was maximal at 4 kHz, approaching the effective BC stimulation level. Figure 2(b) shows the relative difference between BC stimulation level and radiated sound level at the five positions. Here the BC stimulation level is calibrated through the loudness match that is in equivalent loudness to the 60 dB SPL produced by the AC headphone. At 0.5 and 6 kHz, with stimulation at the mastoid, the difference between the effective BC stimulation levels and the radiated sound levels was maximum, at about 20 dB; at these frequencies, the radiated sound would have the smallest influence on BC sound reproduction. At 2 kHz, the difference was 15 dB; at other frequencies, the difference was less than 10 dB. The radiated sound measured for other BC stimulation positions showed similar trends to that for the mastoid, but with higher SPLs. Differences in radiated sound level between other positions and the mastoid were less than 10 dB.

Radiated airborne sound produced by the B-81 transducer. (a) Radiated airborne sound for three participants and its mean value for BC stimulation applied at the mastoid of the test ear. (b) The relative difference between BC stimulation level and radiated sound level with stimulation at the five positions. The BC stimulation level was equivalent in loudness to the 60 dB SPL stimulation through the AC headphone.
Occlusion Effect
Figure 3(a) shows the individual and mean values of the occlusion effect. The values and trends of the occlusion effect varied greatly among participants. The maximum mean value of the occlusion effect was 8 dB at 250 Hz. As the frequency increased, the occlusion effect tended to decrease, and when the frequency was greater than 2 kHz, the occlusion effect was less than 2 dB.

Occlusion effect measured by the change in BC hearing threshold produced by blocking the ear canal with a foam earplug. (a) Occlusion effect and its mean value for all nine participants. (b) Comparison with the results of a study of the occlusion effect using stimulation at the forehead (Stenfelt & Reinfeldt, 2007).
Figure 3(b) compares the present results with those of a previous study (Stenfelt & Reinfeldt, 2007) using a foam earplug and BC stimulation at the forehead. At 2 kHz and above, the results of the two studies were similar, and the occlusion effect was less than 2 dB. The results of the two studies were also in agreement at 250 Hz, where both showed the maximum values. Between 0.5 and 2 kHz, the results of the two studies differed markedly, the occlusion effect for the present study being larger than for the study of Stenfelt and Reinfeldt (2007).
Transcranial Attenuation
The mean TA values for six participants who took part in the localization experiment are shown in Figure 4. The solid line shows measured TAs with open ear canals while the dotted line shows measured TAs with occluded ear canals. Throughout the frequency range (0.25–8 kHz), the measured TA was similar for open and occluded ear canals. Differences in mean attenuation between open and occluded ear canals were less than 5 dB at all measured frequencies and for all BC stimulation positions. Wilcoxon signed-rank tests were used to assess whether the TA at each frequency differed significantly between open and occluded ear canals. In no case was the difference significant (

Mean measurements of transcranial attenuation estimated from BC hearing thresholds with open (solid lines) and occluded ear (dotted lines) canals for stimulation at (a) mastoid and (b) BAHA positions. The TA measured in the present study was compared with that measured by Stenfelt (2012; dashed lines).
Figure 4(a) shows the TA measurements for stimulation at the mastoid. For frequencies up to 1 kHz, the TA with open ear canals fell between 4 and 8 dB. Between 1 and 1.5 kHz, the TA decreased, with the minimum value at 1.5 kHz being between 2.8 dB and 3.5 dB. The averaged TA increased with increasing frequency, approaching 14 dB at 3 kHz, and being slightly smaller at 8 kHz. There were some differences between TAs with the ear canal open and occluded, but their general trends were similar.
Figure 4(b) shows the TA measurements for stimulation at the BAHA position. The mean TA trend was similar to that for the mastoid, except for the frequency range 1 to 1.5 kHz, where there was no decrease for the BAHA position. The average TA for the BAHA position was less than for the mastoid, with an average difference of 3–4 dB and a maximum difference of about 7 dB (at 3 kHz).
The TA measured in the present study was compared with that for a previous study that also measured TA based on hearing thresholds with stimulation at the mastoid and the BAHA position. Stenfelt (2012) reported that the median TA was 3 to 5 dB for frequencies up to 0.5 kHz, close to 0 dB between 0.5 and 1.8 kHz, close to 10 dB between 3 and 5 kHz, and slightly smaller at 8 kHz with stimulation at the mastoid. TA with BC stimulation at the conventional BAHA position was approximately 2–3 dB smaller than for stimulation at the mastoid. The TA measured in the present study was higher than that measured by Stenfelt (2012). The difference for TA was most significant at mid (0.5–1.5 kHz) and high (5–8 kHz) frequencies. One reason is that average data are used here, whereas Stenfelt (2012) used median data. Another important factor is that participants had single-sided deafness in the study of Stenfelt (2012), whereas normal-hearing participants participated in this study. This may affect the measurement of the TA.
Localization Accuracy
Homogeneity tests were conducted to check the consistency of the localization results. The Kruskal-Wallis H test at a significance level of
A repeated-measures analysis of variance (ANOVA) was conducted on the MAE values with test signal and stimulation modality as within-subject factors. There were effects of test signal (F(2,16) = 5.060, p = 0.020) and stimulation modality (F(4,32) = 2.723, p = 0.047). There was no interaction between test signal and stimulation modality (F(8,64) = 1.246, p = 0.288). Table 1 gives the results of Games-Howell post hoc tests. Localization was best (MAE of 20.2° ± 3.4°) when using AC headphones. Localization with BC stimulation applied at the mastoid and condyle did not differ significantly from that for AC headphones. Localization with BC stimulation applied at the supra-auricular, temple, and BAHA positions was significantly worse than with AC headphones (
Results of Games-Howell Post hoc Tests for AC Headphones and BC Transducers at Five Stimulation Positions.
Table 2 gives the results of Tukey HSD post hoc tests for the three test signals. Localization was best with high-frequency stimuli and worst with low-frequency stimuli, with a difference of 3.6° in the average MAE (
Results of Tukey HSD Post hoc Tests for the Three Test Signals.
Figures 5 to 7 show scatter plots of the response azimuths vs the target azimuths for all nine participants and three repetitions of each reproduction condition. If the response azimuths were exactly equal to the target azimuths, the points would lie on the diagonal (dashed line in each plot). For each test condition, the size of the dots is proportional to the number of responses.

Sound localization target-response plots for all participants for low-frequency stimuli for each reproduction manner: (a) AC headphones; (b) BC stimulation at mastoid; (c) BC stimulation at condyle; (d) BC stimulation at supra-auricular position; (e) BC stimulation at temple; and (f) BC stimulation at BAHA position. The simulated source location is plotted on the horizontal axis and the response on the vertical axis. The size of the dots is proportional to the number of responses. The linear fit formula and MAE values are given in each plot.

As Figure 5 but for high-frequency stimuli.

As Figure 5 but for broadband stimuli.
Localization was best for all three stimuli when using AC headphones (smaller MAE and better linear fit). Localization was most accurate in the frontal area of the horizontal plane. At the sides, localization accuracy became worse. The same tendency was found for the BC transducers. Performance was better for stimulation at the mastoid and condyle. This may be due to the smaller distances from the mastoid and condyle to the cochlea than the distances from the other stimulation positions, which facilitate the transmission of BC stimulation to the ipsilateral cochlea, resulting in improved interaural separation (Eeg-Olofsson et al., 2011). Left-right (or right-left) confusions occurred very rarely, accounting for about 1% of the overall localization judgments. Left-right (or right-left) confusions mostly occurred when the sound source was located close to the median plane (Letowski & Letowski, 2012).
A comparison of the localization results for low-frequency (Figure 5) and high-frequency stimuli (Figure 6) shows larger localization errors with the BC transducers for low-frequency stimuli. Localization was worst at the position above the auricle (MAE increased by 6.3°). Correspondingly, participants reported that the difficulty in judging the location of a given sound source was significantly greater for low-frequency than for high-frequency stimuli. The skull motion can be approximated by a mass-spring system in the frequency range between approximately 0.3 and 1.0 kHz (Stenfelt & Goode, 2005b), so the speed of sound transmission is very high and the ITDs are minimal. Since time information is most important for the localization of low-frequency sounds, the minimal ITDs with BC stimulation lead to difficulty in localization for low frequencies.
Performance with broadband stimuli (Figure 7) was between that for low-frequency and high-frequency stimuli. For broadband stimuli, participants may rely on both ITDs and ILDs for localization (Agterberg et al., 2012), which may improve localization performance to some extent. However, crosstalk at low frequencies may lead to small ITDs and ILDs, leading to a conflict between the information provided by low- and high-frequency bands, adversely affecting localization for broadband stimuli.
In summary, both test signal and stimulation position affected BC virtual sound reproduction. Localization was best with high-frequency stimuli and worst with low-frequency stimuli. This finding is consistent with the results of precedence tasks, where high-frequency stimuli gave a better ability to judge the perceived direction than low-frequency stimuli (Stenfelt & Zeitooni, 2013; Zeitooni et al., 2016). BC stimulation led to best localization at the mastoid, for which localization almost matched that with AC headphones, while BC localization was worst at the temple.
Front-Back Confusions
The results of statistical analysis of the FBC are shown in Table 3. Random responses for the 12 stimulus and response azimuths used in the present study would lead to an FBC rate of
Results of the Statistical Analysis of Sound Source Localization in the Entire Horizontal Plane. Mean Values with SDs, Split into
The FBC score (
Discussion
Validity of Experimental Results
Radiated Airborne Sound from BC Transducers
Early studies measured radiated airborne sound at different frequencies and with different BC transducers (Frank & Holmes, 1981; Shipton et al., 1980). All these studies showed that the level of the radiated sound was highest at 4 kHz, especially when using RadioEar B-71 and B-72 transducers. Harkrider and Martin (1998) measured sound pressure levels in the external auditory canals of 50 participants at 2 and 4 kHz with a RadioEar B-71 transducer on the forehead, the mastoid ipsilateral to the probe microphone, and the mastoid contralateral to the probe microphone. They showed that the radiated sound from the B-71 transducer was higher at 4 kHz than at 2 kHz. Lightfoot and Hughes (1993) also showed that the RadioEar B-71 transducer generates more radiated sound at high frequencies. In the present study, the BC stimulation was provided by the RadioEar B-81 transducer, which was designed based on the balanced electromagnetic separation transducer (BEST) principle to reduce radiated airborne sound with minimal nonlinear distortion (Jansson et al., 2015).
From Figure 2, the level of the radiated sound approached the effective stimulation level at 3–4 kHz, which meant that the sound could have been heard via the AC pathway with an unoccluded ear canal. Also, the level of the radiated sound produced by the B-81 transducer was usually less than 10 dB below the effective stimulation level, except for the mastoid and BAHA positions and frequencies of 0.5, 2, and 6 kHz. Therefore, the effect of radiated sound from the B-81 transducer was not negligible. In the present study, the effect of the radiated sound was reduced by the deeply inserted foam earplugs, which give an average attenuation of more than 30 dB in the frequency range of 0.125 to 8 kHz (Brungart et al., 2003). However, the use of foam earplugs may introduce an occlusion effect, as described above.
Occlusion Effect
Stenfelt and Reinfeldt (2007) measured the occlusion effect under different conditions based on ECSP, BC threshold, and an acoustic-impedance model. They demonstrated that the occlusion effect is less than 10 dB when foam earplugs are inserted deeply enough into the ear canal (into the bony part). In the present study, the occlusion effects measured using the BC hearing threshold were less than 10 dB for all participants at all audiometric frequencies, as found by Stenfelt and Reinfeldt (2007). Different BC stimulation positions may lead to different occlusion effects. At first sight, this might appear to explain why the present occlusion effects were slightly higher than those found by Stenfelt and Reinfeldt (2007), as the present study measured the occlusion effect with BC stimulation at the mastoid while Stenfelt and Reinfeldt (2007) used BC stimulation at the forehead. However, Reinfeldt et al. (2013) found that the occlusion effect was 5–10 dB lower for frequencies below 0.8 kHz when the BC stimulus was at the mastoid than when it was at the forehead. The occlusion effects found in the present study were lower than those found by Reinfeldt et al. (2013) for BC stimulation at the mastoid, probably due to the depth of insertion of the foam earplug in the present study (24 mm into the ear canal relative to the tragus, i.e. approximately 17 mm relative to the ear canal opening) being deeper than that used by Reinfeldt et al. (2013, 18 mm into the ear canal relative to the tragus) and shallower than that used by Stenfelt and Reinfeldt (2007, about 22 mm relative to the ear canal opening).
In the present study, the use of earplugs would have altered the sensitivity of the outer ear pathway (Stenfelt, 2016; Surendran & Stenfelt, 2021), which affects TA for frequencies below 1 kHz (Reinfeldt et al., 2013). However, Figure 4 shows that the mean TA was not significantly different between the open and occluded ears over the entire measured range. This is not consistent with the study of Reinfeldt et al. (2013), who found that for frequencies below 0.8 kHz, occluding the ear canal significantly reduced TA. There were differences in the measurement setup between the two studies, e.g. measurement frequencies and masking used for the contralateral ear. The depth of insertion of the foam earplug into the ear canal may also have contributed to the inconsistent results of the two studies. The above analysis only discusses the amplitude. Temporal information can also be affected by the occlusion effect (Stenfelt et al., 2003) where the outer ear part becomes more dominant and may also enhance the ITDs at low frequencies.
Transcranial Attenuation
The present experiments were performed using virtual sound synthesized with HRTFs from the MIT database, and the whole experimental process was completed in a sound-insulated test room. Hence, horizontal localization was mainly related to the transmission characteristics of BC. In other words, auditory spatial perception is primarily determined by the TA of the BC stimuli, as shown by Rowan and Gray (2008) for pure-tone stimuli.
Effect of Stimulation Position
Figure 8 shows the present TA mean measurements for the five BC stimulation positions at 0.5 and 3 kHz. The closer the stimulation position was to the cochlea, the greater was the TA. The TA was significantly higher at both the mastoid and condyle than at the other positions (excluding 0.5 kHz for the temple). Generally speaking, stimulation closer to the cochlea is more efficient for ipsilateral stimulation (Eeg-Olofsson et al., 2011). The effect of stimulation distance from the cochlea was also investigated by Stenfelt and Goode (2005b). They measured promontory motion on cadaver heads for 29 stimulation positions and found decreasing responses with increasing distance between the cochlea and the stimulation position. Rigato et al. (2019) measured promontory motion in cadaver heads and showed lower TA for stimulation at the BAHA position than for stimulation at the mastoid, mainly because of a lower ipsilateral response for the latter. This finding was confirmed by a comparison of patient and cadaver head measurements (Dobrev et al., 2016).

Mean measurements of TA from the present study at 0.5 and 3 kHz for the five BC stimulation positions. The error bars indicate one standard deviation.
In the present study, BC stimulation applied at the mastoid and condyle gave better localization than stimulation at the temple and BAHA positions. This is consistent with the fact that the mastoid and condyle have larger TAs than the temple and BAHA positions. Larger TA means higher interaural separation, and therefore binaural cues are less likely to be disrupted (Stenfelt, 2005; Zurek, 1986), which may lead to better localization performance.
It is generally agreed that BC perception relies on five components (Stenfelt, 2011; Stenfelt & Goode, 2005a): (i) ear canal sound pressure, (ii) middle-ear ossicular inertia, (iii) inertia of the cochlear fluid, (iv) compression and expansion of the cochlear space, and (v) pressure transmission from the cerebrospinal fluid. The relative contributions depend on the stimulation position. BC skull vibration is primarily in the stimulation direction for frequencies below 1 kHz, whereas it occurs in all three dimensions independent of the stimulation direction for frequencies above 1 kHz (Li et al., 2020). Consequently, stimulation at different positions causes different vibrations of the outer, middle, and inner ear. For example, Stenfelt et al. (2002) showed that the inertial effect of the middle-ear ossicles is more sensitive to stimulation in line with the low-frequency vibration direction of the ossicles than in a perpendicular direction. Low-frequency stimulation at the mastoid and condyle is more in line with the direction of the ear canal than at other positions. Hence, the ipsilateral cochlear responses with stimulation at the mastoid and condyle are higher than with stimulation at the other three positions, resulting in larger TAs.
The directional sensitivity of the BC transducer may also be related to the stimulation position. For example, Stenfelt (2005) found that BAHA wearers are more sensitive to sounds from the back than from the front. This is mainly caused by the placement of the BAHA: the BAHA is placed approximately at 112°, partially shielding the BAHA from sounds coming from the front, whereas the ear canal opening is approximately at 90°. Compared to the mastoid and condyle, the temple and BAHA positions are farther away from the ear canal opening. The greater the difference in directional sensitivity between the stimulation position and the HRTF measurement position, the more likely it is that the HRTFs for AC and BC sound reproduction are mismatched.
Effect of Stimulation Frequency
In the present study, the high-frequency band led to better localization than the low-frequency band. A similar trend has been observed in previous studies with vibration and hearing threshold measurements, and it depends on TA to some extent. TA for BC is significantly greater for high-frequency bands than for low-frequency bands.
With stimulation at the mastoid, the median TA was 3 to 5 dB for frequencies up to 0.5 kHz, close to 0 dB between 0.5 and 1.8 kHz, close to 10 dB between3 and 5 kHz, and slightly smaller at 8 kHz. Measures of cochlear vibration using a laser doppler vibrometer and BC stimulation at the mastoid show similar trends (Eeg-Olofsson et al., 2011; Stenfelt & Goode, 2005b). For frequencies below 0.8 kHz, the vibration measurements show negative or near 0 dB TA, while the threshold measurements show positive TA. For frequencies between 0.8 and 6 kHz, all methods show similar results, with three-dimensional cochlear vibration giving the highest TA and one-dimensional cochlear vibration the lowest TA. For the highest frequencies, above 6 kHz, a discrepancy between threshold-based and vibration-measured TA again appears. The discrepancy between threshold-based and vibration-based measures probably occurs because the former involves live humans whereas the latter involves cadavers or dry skull specimens. Different participants have different properties of the head, such as geometry of the head and thickness of the skin and skull. These can affect vibration transmission of the BC path and thus the TA for BC stimulation (Röösli et al., 2021).
Figure 8 shows smaller TA at 0.5 than at 3 kHz, consistent with the results of other studies. This may be due to the different motion modes of the skull in the different frequency ranges (Li et al., 2020; Stenfelt & Goode, 2005b), as described above. Based on Figures 5, 6, and 8, it is evident that higher TAs corresponds to smaller MAEs, as expected. Crosstalk at frequencies where TA is small could be cancelled to improve spatial hearing.
Broadband stimulation did not lead to better localization than high-frequency stimulation, perhaps because interaural cues were only usable at high frequencies or because the interaural cues conflicted across frequencies for the broadband stimulus.
Feasibility of Bone Transducer as a Spatial Audio Interface
For spatial audio presented through AC headphones, Schonstein et al. (2008) reported a
It is generally agreed that the presentation of spatialized stimuli via BC transducers leads to an unnatural sound, which limits spatial perception. The interaural separation produced by stimulation with BC transducers is much smaller than the 60 to 80 dB interaural separation produced by AC headphones (Zwislocki, 1953). Furthermore, the BC transmission pathway is more complex than the AC pathway (Stenfelt, 2011), as described earlier. The complex transmission path results in a vector sum at the cochlea of several different transfer functions (Zhao et al., 2021), leading to complex effects on the amplitude and phase spectrum. These effects could perhaps act to scramble ITDs and ILDs (Rowan & Gray, 2008; Zurek, 1986). However, in the present study horizontal localization was comparable for AC headphones and BC stimulation applied at the mastoid and condyle. Again, this may be partly explained by the use of non-individual HRTFs, which may affect the AC stimulation more than the BC stimulation.
Inside-the-head localization is defined as a perceived sound image inside or on the surface of the head (Xie, 2013). Inside-the-head localization is common for AC headphone reproduction when reverberation is absent (Begault et al., 2001; Kim & Choi, 2005). It is generally believed that localization outside the head is mainly related to the use of binaural cues (Best et al., 2020; Hartmann & Wittenberg, 1996). Specifically, sound sources located close to the median plane are more likely to be internalized than those located off to the side, as the former have less interaural differences than the latter (Leclère et al., 2019). These cues would probably be distorted for BC stimulation, so BC stimulation might be expected to be perceived inside the head. However, only two participants in the present study reported inside-the-head localization during sound reproduction via BC transducers, while eight reported it with sound reproduction via AC headphones. This surprising outcome may be explained by the fact that the use of non-individualized HRTFs would have increased inside-the-head localization for both AC and BC stimulation.
Limitations and Outlook
Commercial BC headphones often produce substantial airborne sound (Kim et al., 2019). The present study avoided the influence of airborne sound using foam earplugs, but in real life users of BC headphones would not use earplugs and might experience a combination of AC and BC sound. When earplugs are used, AC sound becomes negligible relative to BC sound, making the present finding relevant to applications where hearing protection devices are required. In addition, BC transducers in the future may be designed to reduce airborne sound leakage, and such transducers could be used in future experiments, to explore spatial perception via BC stimulation at different positions without confounding effects of occlusion of the ear canal or the airborne sound leakage.
The transmission from BC transducers to the cochlea is complex and frequency-dependent, making it hard to determine how spectral content influences spatial perception. A loudness match was performed between stimuli presented via BC transducers and AC headphones, but it may be substituted with the threshold measurement in future work.
Another limitation is that the number of participants (N = 9) was relatively small. Finally, non-individualized HRTFs were used, and this probably limited the accuracy of spatial perception via both AC and BC stimulation, perhaps more so for the former.
Given these limitations, future work should consider: (i) using individualized HRTFs for both AC and BC stimulation; (ii) measuring BC localization using transducers with minimal airborne sound transmission and with open ear canals; (iii) recruiting more participants.
Conclusions
In this paper, virtual BC sound localization tests were conducted at five stimulation positions using different frequency bands. To suppress radiated airborne sound when testing BC localization, foam earplugs were used. The associated occlusion effects were found to be small. The results showed that the stimulation position and frequency band both affected auditory spatial perception. Localization accuracy for BC stimulation at the mastoid and condyle did not differ significantly from that using AC headphones, and was significantly better than that for the other three BC stimulation positions. Frontal horizontal localization accuracy with high-frequency stimuli was comparable to that with broadband stimuli and significantly better than that with low-frequency stimuli. Measurement of the TA indicated that higher TAs improved localization accuracy probably because of better preservation of binaural cues. However, there was no significant difference in FBCs either between AC and BC stimulation or among the five BC stimulation positions.
The results of the present study indicate that HRTF-processed BC signals allow sound localization in the horizontal plane with accuracy similar to AC headphone-based sound reproduction. This demonstrates the feasibility of a spatial audio interface with two BC transducers bilaterally at the mastoid or the condyle. It also implies that high-frequency information transmitted through BC improves sound localization. Further studies should investigate the vertical localization of virtual BC signals and compare the variability in BC localization performance with ear canals open and occluded. Spatial localization accuracy may be improved with crosstalk cancellation in future work.
