Abstract
Keywords
Introduction
Background
A satisfactory level of speech transmission quality is needed in many types of indoor spaces where speech intelligibility is of critical importance to support the spaces’ main functions. Examples of these spaces include metro stations, stadia, airports lounges, train ticket halls, assembly halls, foyers, classrooms, lecture theatres and workshops. Relevant research in the literature can be found dedicated to the improvement of speech transmission quality and intelligibility in those spaces.1–6 Similarly, several international standards and codes of practice in the literature7–12 offer their attention to the same topic in those venues. They provide guidance on test procedures and performance criteria for speech intelligibility in a range of buildings where Public Address (PA) or Voice Alarm systems (VA) are employed to reinforce and distribute voice messages.
In indoor spaces of smaller size, speech transmission quality and the resultant intelligibility are also crucial factors to enable the attainment of the intended purposes of specialist rooms. In these spaces, distances between human talkers and listeners tend to be of short ranges, hence amplified Speech Reinforcement Systems (SRS) are usually deemed unnecessary. Instead, verbal communications rely on Person-to-Person Speech Communications (PP) (also called direct communication or Natural Acoustics Communication (NAS)).
A PP system is comprised of the human talker (speech sound source), the room (transmission channel), and the listeners (receivers) and is characterized by the source and receiver being in the same environment in the absence of electroacoustic speech reinforcement devices such as microphones, amplifiers, or loudspeakers. 11 A PP system can also be employed in outdoor verbal communication scenarios in which the open space is the transmission channel.
Figure 1(a) shows a simple model of human Person-to-Person(s) Speech Communication system (PP) in a room potentially affected by unfavourable factors in the transmission channel such as the presence of reverberation, echoes or/and background noise. Figure 1(b) shows the same PP model in an outdoor space potentially affected by the presence of background noise.

(a) PP system in an indoor space or room. (b) PP system in an outdoor space.
Table 1 shows the three components forming a PP communication system (column categories) and their corresponding main factor potentially affecting the speech intelligibility of the system. For the purposes of this investigation all the factors affecting the transmission channel are considered, as well as the directivity and vocal effort of the speech source and orientation of the receiver. Factors in Table 1 denoted by a dash (-) are not taken into account.
Person to person system components with their corresponding main factors.
The PP system is widely employed in a variety of spaces of small to moderate sizes where speech transmission quality and the resulting speech intelligibility is of critical importance.7,11 Classrooms, offices, disaster/air traffic control rooms, operation theatres, court rooms, laboratories, interview rooms, auditoria, theatres, and meeting rooms are examples of indoor specialist spaces where PP system are normally used (see examples of PP indoor applications in Figure 2(a) and (b)). Addressing a group of occupants outside a building in an evacuation, conveying information during a tour around outside facilities, outdoor theatre performances or outdoor soundscapes assessments are examples where PP is employed outdoors.

(a) Teaching and learning in a laboratory and (b) meeting/board room.
The research literature shows that in certain situations the use of SRS system in place of PP does not necessarily increase the estimated speech intelligibility score in rooms 13 or improve learning attainment in classrooms. 14 A theatrical performance is a clear example where PP is preferred over SRS systems. 15 Some standards 11 and guidance 16 recommend for certain applications the prioritization of the use of PP over SRS systems.
The topic of speech intelligibility in specialist rooms where PP is employed is considered to different degrees in relevant international/national standards and codes of practice7,10,16–22 where some guidance and performance criteria is provided.
The assessment of the potential speech intelligibility and/or speech privacy in specialist room is crucial in the determination of their suitability for their intended purposes. The Speech Transmission Index Public Address (STIPA) is a globally accepted and standardized method 7 to objectively determine the potential speech intelligibility in PP applications. The STIPA metric is a subset version of the parent full Speech Transmission Index (STI) method. 7 Both the full STI and subset STIPA method rate the estimated speech intelligibility of the transmission channel between 0 and 1, where ‘0’ corresponds to total unintelligibility and ‘1’ to a maximum or total intelligibility.
The STIPA method in PP scenarios requires the STIPA speech-like test signal to be reproduced acoustically by a sound source that simulates a human talker natural speech production. 7 International and national standards (see Table 2) recommend the use of a suitable test loudspeaker or a special electroacoustic sound source to emulate the speech acoustical characteristics of a human talker. To that purpose, the physical size, directivity, orientation, and frequency response of the speech sound source are the key parameters to consider in its application.
Summary of international and national standards where artificial mouth simulators or alternative speech sound sources are referred to.
In assessments of speech intelligibility and measurements of STIPA from SRS systems, a speech sound source is also required to provide a human-like reference acoustical test signal into the microphone of the system.7,10,12,19
As it can be seen in Table 2 the requirements and level of specification for acceptable speech sound sources to be used in PP applications vary among relevant standards.
The most informative standards regarding speech sound sources7,8 detail the electro-acoustics requirements for a ‘special test loudspeaker’ (e.g. artificial mouth 24 or talkbox) for STIPA testing in SRS and PP scenarios. It also provides strict specifications for other ‘suitable transducers’ as standardized speech sound sources in the absence of special test loudspeakers.
Some of the other standards in Table 2 provide limited or unspecific electro-acoustic requirements for a speech sound source10–12,20–23 thus allowing a wide range of alternative non-special loudspeakers to be employed as conforming speech sound sources.
Motivation
There are only a few special speech test sound sources available in the market that meet the electro-acoustics performance specifications of the most informative standard. 7 They are expensive devices and can be deemed unaffordable and inaccessible to researchers, consultants, and other practitioners. 25 The strict performance requirements for substitute speech sources indicated in that standard 7 can make it difficult for those users to find, test, or construct alternative sources that satisfies the standard requirements.7,25
Very limited research exists in the literature concerning the suitability of non-special or non-standardized loudspeakers as alternative speech sound sources in PP testing applications. A study 26 was found that employed various commercial loudspeakers as speech sources to investigate the influence of their directivity characteristics on STI in four simulated indoors spaces. The study showed that the largest differences in STI between loudspeakers occurred in the larger rooms. The author also preliminarily showed that loudspeaker drive units of around 60 mm diameter produced the smallest errors and suggested that this size is more appropriate to simulate the human speech directivity pattern. However, the results obtained were in noise-free conditions and based on room acoustic computer simulation methods involving limitations with the virtual characterization data of some of the loudspeakers employed. Non-special loudspeakers are widely used in the relevant industry and academia13,15,27–32 to conduct speech intelligibility investigations in place of standardized special speech test loudspeakers (i.e. mouth simulator 24 or talkbox). Nevertheless, due to the absence of guidance and practical information in the literature, those investigations are conducted without knowledge on the alternative loudspeakers’ practical suitability and validity of results.
To address that lack of information, a recent preliminary investigation 25 examined experimentally the suitability of a range of representative non-special loudspeakers as an alternative to standardized speech sources in PP speech intelligibility investigations. Frequency response and STIPA experimental results obtained from a reference standardized special speech source were compared against results from various non-special loudspeakers measured under a range of real-world combinations of PP scenarios and influencing factors. The work found a close STIPA agreement between the reference and the alternative non-specialist speech sources.
Aim and scope
This study aims to further evaluate the performance validity of non-special loudspeakers as alternative to standardized special speech sound sources in person-to-person speech intelligibility assessment. It supplements the preliminary research undertaken previously 25 by providing additional evidence to consolidate earlier findings and to expand the range of their applicability. This is achieved by extending the scope of experimental testing and analysis of additional representative scenarios combinations of influencing factors. The incremental work respect the previous research, includes an expanded literature review and rationale, the incorporation of new test conditions including a higher interfering background noise level (55 dBA) and a wider off-axis source-receiver angle (45°), a comprehensive analysis of twenty scenarios, the addition of new discussion insights and a guidance for the selection of alternative speech test sources.
The complete findings and insights in this study will, for the first time, provide professional and research practitioners guidance and reassurance on the applicability and suitability of affordable non-special loudspeakers. This in turn will enable more researchers and practitioners to undertake PP speech intelligibility assessments and investigations when special standardized speech sources are not available or other relevant standards allowing non-special loudspeakers as alternative speech sources are adhered to.
Materials and methods
The performance validity evaluation of non-special loudspeakers was based on the comparative analysis of STIPA results between alternative speech sources and the standardized reference speech test source. The materials and methods in this study followed those described in a previous preliminary investigation. 25
Below is a summary of the speech test sources’ description under test and the method procedure employed in this study. Additional technical specifications for each loudspeaker (such as directivity patterns) can be found via the corresponding citation reference indicated in the Model column of Table 3.
Description of speech test sources.

Photo of speech test sources (see Table 3) from left to right: (a) Anker, (b) Fostex, (c) Talkbox, and (d) Yamaha.
Figure 4 presents the frequency response in one-third octave bands to pink noise input signal reported in previous study 25 for the four speech test sources measured in anechoic conditions at 1 m on axis. The Anker green trace represent the average of the values from the three Anker units tested and the error bars indicate the standard deviation.

Frequency response in one-third octave bands for four speech test sources.
Measurements of the background noise sound pressure level (SPL) and STIPA were performed in turn on each of the speech sources in two different controlled physical acoustic environments (test rooms) which simulated a range of real-world PP acoustic conditions.
The first acoustic environment (semi-reverberant test room) consisted of the reverberation chamber at London South Bank University (LSBU) of 204 m3 of volume, including 10 m2 of highly sound-absorbing material (mineral wool) exposed on one of the chamber’s walls (Figures 5(a) and 6(a)). The mid-frequencies average (500 Hz, 1 kHz, and 2 kHz) reverberation time RT30midfreq of the semi-reverberant test room measured to ISO 3382-1:2009 37 was 1.7 s. The second acoustic environment (anechoic test room) was the LSBU full anechoic chamber of 145 m3 (excluding the volume occupied by wedges; Figures 5(b) and 6(b)).

(a) Semi-reverberant room test layout and (b) anechoic room test layout (plan view, not to scale).

(a) Photo of the semi-reverberant room test layout and (b) anechoic room test layout.
A fully in-calibration NTI-Audio XL2 class I acoustic analyser incorporating an NTI M2215 omini directional pre-polarized condenser microphone (frequency response: Class-1, ½″, dynamic range: 25–153 dB) was employed to take background noise and STIPA signal sound pressure level (SPL) measurements (receiver SLM1). Another fully in-calibration XL2 class-I analyser incorporating an NTI M2211 omni- directional pre-polarized condenser microphone (frequency response : Class-1, ½″, dynamic range: 21–144 dB) was used as the receiver to take STIPA measurements (receiver SLM2). Both analysers fully conformed with class-I specifications of sound level metres of relevant international standard. 38 Both acoustic analysers were field calibrated before and after the measurement session by a Norsonics 1251 class- I calibrator, which itself was within traceable and valid laboratory calibration time window. No drift was observed in the field calibrations.
The adjustable background noise system consisted of a calibrated signal generator (NTI-Audio Minirator, MR-Pro) which fed pink noise signal into an audio amplifier driving an ANV dodecahedron sound source. Another signal generator (NTI-Audio Minirator, MR-Pro) provided the STIPA test signal via XLR cable connection into the line-in input of Yamaha and Fostex sources. STIPA signal was provided to the Anker source line-in input from a Toshiba Portege laptop via a mini-jack cable.
For both acoustic environments (or test rooms), the source position consisted of a reference mark point set at 1.6 m height from the floor. This mark acted as a guide to situate with precision the approximate geometrical centre of each speech source. The receiver consisted of the SLM2 microphone set also at 1.6 m height from the floor and situated at 1 m on axis (0°) from the source position point (Figure 5(a) and (b)). The receiver SLM2 body was connected remotely to its microphone via an XLR extension cable to avoid contaminating reflections from the analyser’s or operators’ bodies.
The layouts for sources and receivers in both rooms (Figures 5(a) and (b) and 6(a) and (b)) were implemented to represent a range of potential and realistic PP scenarios and to examine the effects of source–receiver distance, angle, and acoustic conditions.
STIPA measurements were performed in both rooms following the test procedure specified in the latest version of relevant standard IEC 60268-16: 2020. 7 Each speech source under test was fed in turn with the STIPA test signal (fifth version) specified in the latest version of the relevant standard. The output level of the STIPA test signal was adjusted in the anechoic chamber for each source to measure 70 dBA at the SLM2 receiver with its microphone positioned on-axis at 1 m from the speech source position. This calibration adjustment was performed to match the fixed signal output from the Talkbox (reference source) STIPA test signal Lombard level option. This selected output signal level corresponds to raised vocal effort exerted by talkers to overcome noisy backgrounds (Lombard effect). In line with the relevant standard IEC 60268-16:2020 7 test procedure, 70 dBA was chosen for this study as representative level of raised vocal effort expected to be exerted by a person addressing a group of people situated at different distances in an indoor or outdoor PP scenario. Once the speech sources’ output levels were calibrated, they remained un-changed for the duration of the entire measurement session.
Each source and receiver microphone height in both rooms was set at 1.6 m from the floor (i.e. adult average standing ear and mouth height). 7 During STIPA measurements in both test rooms, pink noise was emitted by the ANV dodecahedron sound source (Dodec) positioned at 4 m from the nearest receiver position at 1.6 m from the floor acting as a background noise source. The level of this adjustable background noise was set in both rooms to measure 55 dBA at each receiver position to represent interference background noise at a level typical of STIPA assessments in PP in indoor spaces 9 affected by external noise ( e.g. traffic noise intruding into classrooms 39 or/and internal noise (e.g. occupancy noise or/and building services noise in workplaces 40 ).
STIPA results from three non-special affordable loudspeakers were compared against the results from the reference standardized speech source (NTI Talkbox) and absolute errors were determined. Absolute error is defined in this study as the arithmetic difference between the STIPA value obtained from the reference source and the value from the non-special loudspeaker under test. Mean absolute error denotes the arithmetic average of a set of the absolute errors.
The validity evaluation was stablished through the analysis of absolute errors and standard deviation (std) of STIPA values as a function of four influencing factors: source-receiver distance, acoustic environment, source-receiver orientation and signal to background noise ratio (SNR).
The new additional variables employed in this study with reference to the previous relevant investigation 25 and their purpose are instituted as follows.
To further assess the influence of off-axis source-receiver orientation, measurements were obtained at 45° off axis of the speech source under test, for two acoustics environments (semi-reverberant and anechoic) and for two source-receiver distances (1 and 4 m; see Figure 5(a) and (b)).
To examine the effect of lower signal to background noise ratio (SNR), a higher background noise level was artificially introduced to all measurements. Pink noise was employed as the controllable background noise signal. Background noise level at 55 dBA (up from 35 dBA in Reference 25 ) was set at each microphone position for two acoustics environments, three source-receiver orientation angles (0° (on axis), 30° and 45°), and two source-receiver distances (see Figure 5(a) and (b)).
Reported STIPA values were calculated as the arithmetic average of five measurement readings (repeats) taken consecutively at the receiver position for each scenario or measurement condition.
To enable a comprehensive examination, STIPA data obtained from the previous investigation 25 at 35 dBA of background noise for 1 and 4 m in both test rooms was incorporated in the analysis of results.
The combination of the influencing factors and variables considered for all speech sources under investigation, consisting of three source-receiver orientation angles (0°, 30°, 45°), two source-receiver distances (1 and 4 m), two background noise levels (35 and 55 dBA), and two acoustic environments (semi-reverberant and anechoic), would have generated twenty four measurement scenarios. However, measurements at 45° source-receiver angle with 35 dBA background noise level were not performed for the two source-receiver distances in any of the two environments. Hence the total measurement scenario combinations used for the overall data collection were twenty.
Absolute errors were evaluated in relation to the STI Just Noticeable Difference (JND). A value range of ±0.03 STI is the uncertainty associated with the STIPA method. 7 This value is also accepted by some authors41–43 as the estimated Just Noticeable Difference (JND) of the STI. However, researchers close to the inventors of the STI and successors of the development of the metric estimate the JND of the STI to be 0.1.44,45 In the context of this study, absolute errors below one JND could be interpreted as non-perceivable and, therefore, negligible. In this investigation both JND threshold values are considered in the analysis of results.
Figures 5 and 6 show the test layout for measurements taken in both acoustic environments for combinations of two source-receiver distances for three source-receiver angle orientations.
Results and discussion
STIPA measurement results, measurement uncertainty and corresponding absolute error values obtained for each speech sound source at different distances, angles, background noise levels and test rooms are presented in Figures 7–12.

(a) STIPA values measured in the semi-reverberant test room on axis at 1 and 4 m at 35 and 55 dBA of background noise. (b) STIPA mean absolute error values.

(a) STIPA values measured in the semi-reverberant test room 30° off axis at 1 and 4 m at 35 and 55 dBA of background noise. (b) STIPA mean absolute error values.

(a) STIPA values measured in the semi-reverberant test room 45° off axis at 1 and 4 m at 55 dBA of background noise. (b) STIPA mean absolute error values at 1 and 4 m at 0°, 30° and 45° at 55 dBA of background noise.

(a) STIPA values measured in the anechoic test room on axis at 1 and 4 m at 35 and 55 dBA of background noise. (b) STIPA mean absolute error values.

(a) STIPA values measured in the anechoic test room 30° off axis at 1 and 4 m at 35 and 55 dBA of background noise. (b) STIPA mean absolute error values.

(a) STIPA values measured in the anechoic test room 45° off axis at 1 and 4 m at 55 dBA of background noise. (b) STIPA mean absolute error values at 1 and 4 m at 0°, 30° and 45° at 55 dBA of background noise.
In all figures, each colour represents a scenario. In figures 7–12. (a), each column value is the average of five STIPA readings (repeats) taken of the same scenario. The error bars indicate the measurement uncertainty expressed by the estimated standard deviation of five STIPA readings taken per scenario.
Results
The red dotted lines in figures 7–12. (b) indicate the STI JND at 0.03 and 0.1. For readability STI JND of 0.03 and 0.1 will be referred as JNDa and JNDb respectively.
Measurement uncertainty and spread of results
The STIPA measurement uncertainty (shown as error bars in Figures 7(a)–12(a)) for the alternative speech sources (or no-special loudspeakers) for the all the scenario combinations was mostly within 0.01 (0.3 JNDa) with few cases within 0.02 (0.7 JNDa). The same values were obtained from the reference standardized speech source. These results showed the high repeatability of the alternative speech sources standing comparable in that characteristic to the reference speech source.
The spread of STIPA results (quantified by the standard deviation (std)) among the alternative sources was within 0.02 (0.7 JNDa), observed in all twenty scenarios combinations except for one which was within 0.04 (1.3 JNDa; Figures 7(a)–12(a)). That small spread indicates that the alternative speech sources perform almost equally irrespective of the four influencing factors.
The low measurement uncertainty and spread of results is consistent with those observed in the previous study. 22
Errors in the semi-reverberant environment
Examining absolute error results for all the alternative speech sources in all scenarios (Figures 7(b)–9(b)) it can be seen that errors were largely within 0.03 (or one JNDa), except for one Yamaha scenario showing 0.04 (1.3 JNDa) and Anker1, Anker2 and Anker 3 showing few outlier results between 0.04 (1.3 JNDa) and 0.06 (2 JNDa). These slightly higher values were well within 1 JNDb and do not appear to be systematic or follow any systematic pattern or affected by the influencing factors at play.
Errors in the anechoic environment
The absolute errors for all the alternative speech sources in all scenarios (Figures 10(b)–12(b)) were largely within 0.03 (1 JNDa), except Anker 1 and 2 that measured 0.04 (1.3 JNDa) and Anker3 that measured 0.05 (1.7JNDa) for only one scenario (Figure 11(b)). These Anker slightly higher errors (well within 1 JNDb) are not consistent with other related scenarios and do not appear to follow any pattern thus they can be deemed irrelevant. The Yamaha showed a slight pattern of decreasing errors with orientation angle (from 0.09 (3 JNDa) at 0° to 0.05 (1.7 JNDa) at 45°) for the 4 m distance at 55 dBA background noise scenario (see Figures 10(b)–12(b)). This could suggest in principle that the Yamaha features a higher directivity than the rest of sources (including the reference) most likely caused its larger size. However, these errors were not observed at 35 dBA when tested also at 4 m at 0° and 30°. Those errors (within 1 JNDb) and the suspected pattern were not observed in the semi-reverberant environment.
Errors as a function of orientation angle
Figure 9(b) presents the STIPA errors from the speech sources in the semi reverberant environment as a function of distance and orientation angle with background noise fixed at 55 dBA. Examining the effect of the orientation angle, it can be seen that in general errors are within (0.03 (1 JNDa)) at both distances, are not affected by the orientation angle. Inspecting the relevance of the higher values, Anker2 gave an error of 0.06 (2 JNDa) at 45° at 1 m. Anker1 and 3 showed an error of 0.05 (1.7 JNDa) at 0° and 45° also at 1 m. However, these values were not observed at 30° or from Anker2 at 0° at the same distance. These errors potentially attributed to orientation angle are not observed for any of the Anker sources at 4 m. Moreover, the magnitude of those errors can be deemed even less noteworthy considering they are well within 1 JNDb.
For the anechoic environment, Figure 12(b) shows the STIPA errors as function of distance and orientation angle with background noise also fixed at 55 dBA. Analysing the effect of orientation angle, it can be observed that most of errors from of the sources are within 0.03 (1 JNDa) and they are not affected by orientation. The higher values were errors from the Yamaha for several angles and distances (errors ranged between 0.09 (3 JNDa) to 0.05(1.7 JNDa)) and less significant errors from three Anker sources ranging between 0.04 (1.3 JNDa) and 0.05 (1.7 JNDa) for the specific 30° at 1 m scenario. The analysis of those higher values is discussed above. The Fostex single error above 1JND of 0.05 (1.7 JNDa) at 4 m at 0° scenario, as is the case of other errors slightly above 1 JNDa, does not follow any pattern and cannot be attributed to any of the influencing factors and is therefore deemed irrelevant. In addition, the magnitude of all errors seen in the anechoic environment are well within 1 JNDb.
Errors as a function of increasing the background noise level
In the anechoic environment (Figures 10(b) and 11(b)) the three Anker sources showed differences in error within 0.04 (1.3 JNDa) when the background was increased from 35 to 55 dBA. These differences are too small to indicate any influence due to the increase. However, the Yamaha showed error differences within 0.06 (2 JNDa) at 0° and 30° at both distances. Considering also the reduction of error with orientation angle seen in the anechoic condition earlier at 55 dBA but not at 35 dBA, it could be argued the Yamaha is more directional than the rest of sources (including the reference) and that feature on absolute errors is more affected by the higher background noise level. Nevertheless, these errors are relatively small and well within 1 JNDb so they could be considered not meaningful enough to be noteworthy.
In the semi-reverberant environment (Figures 7(b) and 8(b)) increasing the background noise level by 20 dBA did not influence the errors obtained for both distances at 0° and 30°. All error differences observed were too small (within 0.03 (1 JNDa)) to bear any relevance or be able to reveal any pattern.
Summary
Table 4 shows the mean absolute errors calculated from the absolute errors obtained for each speech source at 1 and 4 m at 35 and 55 dBA in the semi reverberant and anechoic environments.
Mean absolute error for each speech source in both environments.
As it can be seen in Table 4 the mean absolute errors obtained for all alternative speech sources at both levels of background noise in both environments were within 1 JNDa; except for the Yamaha in anechoic condition which was 1.3 JNDa. The standard deviation of all mean absolute errors at both distances for both acoustic environments shown in Table 4 were below 1JNDa.
All the errors measured in this study were well within 1 JNDb. STIPA errors within 1 JND can be interpreted as non-perceivable and therefore negligible.
The high measurement certainty and low spread of results consistently observed from all sources at all scenario combinations provides further confidence in the results and findings presented.
From the data analysis shown above, it follows that the non-special loudspeakers performance tested for the numerous scenarios have consistently shown close agreement with the standardized speech source. These findings validate the proposition that non-special and affordable loudspeakers may be used as suitable speech test sources in pilot or survey-grade speech intelligibility assessments in place of a standardized special speech source when that special source is not available, or in full investigations when the relevant standard to adhere to allows the use of those alternative speech sources.
From these conclusive findings, it could be implied that the STIPA metric when employed in PP situations, might allow for less restrictive tolerances in the speech test loudspeaker specifications than is currently specified in the most informative standard. 7
Based on the literature and findings of this study the following guidance is suggested for the selection of suitable non-special loudspeakers for the applications described: To choose a quality commercial or professional grade loudspeaker for music or speech reproduction featuring an enclosure of its largest dimension of around 170 mm incorporating a single-driver of wide frequency range (i.e. 100–10 kHz) of cone diameter between 60 and 100 mm.
Conclusions
A study has been conducted to validate the suitability of non-special loudspeakers as speech sources employed in speech intelligibility assessments in Person-to-Person Speech Communications (PP). It has extended the scope of applicability and consolidated the findings obtained in a previous investigation.
Experimental Speech Transmission Index for Public Address (STIPA) tests were conducted on three representative non-special loudspeakers and a standardized reference speech source considering four influencing factors (source-receiver distance, acoustic environment, source-receiver orientation and signal to background noise ratio) affecting PP transmission channel. Twenty measurement scenario combinations generated a total of 600 measurement readings. Absolute errors of STIPA measurements obtained between the non-special loudspeakers and the reference source were analysed and their relevance evaluated against two threshold levels of Just Perceivable Difference (JND) for the STIPA parameter (0.03 and 0.1 of STI).
The mean absolute errors obtained for all non-special loudspeakers for almost all scenarios were within one JND of any of the two JND levels considered. The low measurement uncertainty and low spread of results consistently observed from all sources at all scenario combinations (standard deviation typically within 0.02 of STI) provided further confidence in the results obtained. These findings are in close accord with those from a previous investigation and serve here to corroborate the earlier conclusions.
The striking and consistent low level of errors found validate the proposition that non-special and affordable loudspeakers may be used as suitable speech test sources in pilot or survey-grade speech intelligibility assessments, in place of a special standardized speech source when that source is not available or in full investigations when the relevant standard to adhere to allows the use of those alternative speech sources.
The conclusive findings and insights of this study will inform and enable more researchers, consultants, and other practitioners to conduct speech intelligibility investigations in PP scenarios.
It is anticipated that the information presented in this work could influence future design, development and commercialization of loudspeakers for speech reproduction.
