Sage Journals: Discover world-class research

Abstract

Kiezdeutsch is a multiethnolectal variety of German spoken by young people from multicultural communities that exhibits lexical, syntactic, and phonetic differences from standard German. A rather salient and pervasive feature of this variety is the fronting of the standard palatal fricative /ç/ (as in ich “I”) to [ɕ] or [ʃ]. Previous perception work shows that this difference is salient and carries social meaning but dependent on the listener group. Further investigations also point to the significance of /ɔɪ/-fronting in production; however, whether this is salient in perception has not yet been investigated. In several (multi)ethnolectal varieties, differences in voice quality compared to the standard have been identified. Therefore, in this study, we present an acoustic comparison of voice quality in adolescent speakers of Kiezdeutsch and standard German, with results showing that Kiezdeutsch speakers produce a breathier voice quality. In addition, we report on a perception test designed to examine the social meaning of voice quality in combination with two segmental cues: coronalization of /ç/ and /ɔɪ/-fronting. The results indicate perceptual gradience for phonetic alternations detected in Kiezdeutsch with coronalization of /ç/ being a highly salient and reliable marker, whereas fronting of /ɔɪ/ and breathy voice do not appear to be clearly enregistered features of Kiezdeutsch by all listeners. Thus, even though we find differences in production, these may not necessarily be relevant in perception, pointing toward enregisterment— like sound change—being a continuous process of forming learned associations through tokens of experiences.

Keywords

Social meaning speech production speech perception

1 Introduction

The study of social meaning—the association of linguistic form with social categories—by now has a long tradition in sociolinguistics. The focus on social meaning is a characteristic of the so-called third wave of sociolinguistics (Eckert, 2012). Work in the first wave of sociolinguistics was primarily aggregating large demographic tracts and associating language behavior with sociodemographic characteristics, as sociolinguists attempted to map out the speech of large urban centers, such as New York City (Labov, 1972a), Norwich, England (Trudgill, 1974), and Montreal, Canada (Sankoff & Cedergren, 1972), so they were able to draw correlations between linguistic variables and macrosocial categories, such as age and social class. Later, work of the second wave, such as Milroy (1980) in Belfast and Rickford (1986) in Guyana, made local sense of macrosocial variables, such as religion and class. They also ushered in methodological innovations such as the use of social network analysis and ethnography. The third wave of variation (Eckert, 2012) is distinguished by the exploration of the stylistic expression of individuals as part of a social order, where language change is linked to the indexing of variables to personae (Coupland, 2007).

In such contexts, the study of voice quality as a social marker has only relatively recently gained traction (e.g., Podesva, 2007, 2013). In its broadest sense, voice quality refers to the particular combination of settings implemented during the production of speech, including phonatory, articulatory, and muscular settings (Laver, 1994). Often, however, the term voice quality is used in a narrower sense to refer to phonation only, and the different phonation types that result due to changes in laryngeal settings (Keating & Esposito, 2006; Sóskuthy & Stuart-Smith, 2020; Wright et al., 2019). Research on social meanings of voice quality has generally focused on this narrower sense and has largely been concerned with non-modal phonation, such as creaky voice (Lefkowitz & Sicoli, 2007; Mendoza-Denton, 2011; Stuart-Smith, 1999b; Yuasa, 2010). Creaky voice is generally produced by rather low airflow through the glottis resulting in slow and irregular vocal fold vibration which in turn causes a low fundamental frequency (F0) (Davidson, 2020; Keating et al., 2015; Laver, 1980) (though note there are a number of different acoustic realizations of creaky voice; see Keating et al., 2015 and Garellek, 2019 for a description of these). For this reason, it has been associated with performing toughness and gender in analogy to Ohala’s (1994) frequency code, where low frequency is indicative of larger body sizes and high frequency is associated with tininess (in addition to a range of communicative functions and social meanings across different language varieties, see e.g., Gobl & Ní Chasaide, 2003; Yuasa, 2010). Johnson (2006, p. 486) points out that “the cross-language and within-language phonetic arbitrariness of gender” calls into question “unitary abstract phonetic representations” and suggests that gender is subject to performance, highlighting that speakers are social actors. This fits well with creak not only as a performance of toughness by gang girls in California (Mendoza-Denton, 2011) but also as a signal of young urban upwardly oriented professional females in California (Yuasa, 2010). Podesva (2013) reports overall higher rates of creak in females than males in his sample; thus, toughness and gender performance are socially constructed through creak and through interpretations of creak.

Breathy voice, on the other hand, has received much less attention as a socially meaningful marker. While it has often been found to be associated with the speech of women and producing a “desirable” sounding female voice (Hall, 1995; Henton & Bladon, 1985; Ito, 2003; Ohara, 2004; Stuart-Smith, 1999a), it is less often interpreted along the lines of social performances or social categories (Podesva, 2013—though see Podesva & Callier, 2015, for a discussion of the role of breathiness, among other voice qualities, in reported speech and displaying affect, and Teshigawara, 2003, for an examination of voice quality including breathiness in portraying characters as either good or evil in Japanese anime). Breathy voice is produced with a glottis open along most of its length, allowing for more air to pass through the vocal folds which nevertheless vibrate regularly. This airflow mechanism makes it articulatorily rather difficult to sustain breathy voice over a longer stretch of speech material (Catford, 1977), which may contribute to why it is less often observed as a social marker. In addition, breathy voice may be considered to have a less clearly defined real-world correspondence compared to F0 and creaky voice. Breathy voice may also lend itself less to acquiring social meaning because of its less salient perceptual properties (Laver, 1980). As neither creaky nor breathy voice is phonologized in English or German, voice quality is not linguistically contrastive, as a change in voice quality does not change the meaning of a particular word or utterance. Taking a functional perspective, creaky voice can be said to take over some structuring properties in languages such as English or German as it often marks the end of phrases or the onset of strong syllables (Fougeron & Keating, 1997; Garellek, 2014, 2015; Henton & Bladon, 1988; Kreiman, 1982; Ogden, 2001). Breathy voice is not known to mark prosodic strengthening at the onset of linguistic domains.

Variation in fine phonetic detail often describes within-category variation. Johnson (2006) rightfully pointed out that this causes a problem for assumptions underlying theories of speech perception posing abstract phonological primes to which variants must be matched. A smart way around this mapping problem is the idea of alternants receiving meaning in themselves through exposure. The learned link between the linguistic form and a social category or identity of users, speakers’ ideological stances, their social demographics or attitudes, and so on (Eckert & Labov, 2017) allows for the interpretation in perception (Jannedy & Weirich, 2014c; Weirich et al., 2020). Lacking a meaningful interpretation causes a variant to remain what Labov (1966) calls an indicator without an association to a social category.

When the usage of specific pronunciation variants or voice qualities becomes emblematic and indexical (Silverstein, 2003) of a specific speaker group and it is recognized as such by speakers and hearers, it becomes enregistered (Agha, 2003). With such recognition, it surpasses what Labov calls an indicator (there is a difference but it is not noticed) and it becomes a marker. In this study, we examine three acoustic parameters, two of which are segmental alternants that have been previously found to vary significantly in the multiethnolect Kiezdeutsch (KD) as spoken in Berlin and regional standard German (SG) as spoken in Berlin: coronalization of palatal fricatives and fronting of the diphthong /ɔɪ/. The third parameter is voice quality, which has not previously been investigated in this context. To illuminate the role of these three phonetic factors in the partitioning of the social and linguistic landscape of Berlin, we have conducted a perception test with the goal to understand the social meaning associated with specific alternants.

1.1 Kiezdeutsch

KD is a multiethnolectal variety of German spoken by young people from multicultural communities that exhibits lexical, syntactic, and phonetic differences from SG (Auer & Dirim, 2003; Jannedy & Weirich, 2014c; Wiese, 2012). It originated in neighborhoods with predominant Turkish migrant workers. Today, adolescents from many ethnic and linguistic backgrounds use features of this variant to varying degrees, particularly in neighborhoods with high levels of multilingualism, such as Kreuzberg.

A rather salient and pervasive feature of KD is the fronting of the standard palatal fricative /ç/ (as in ich “I”) to [ɕ] or [ʃ] (Jannedy et al., 2011, 2015; Jannedy & Weirich, 2014c, 2017). Jannedy et al. (2015) showed that the two phonemes /ç/ and /∫/ are merged in the speech of adolescents from the neighborhood of Kreuzberg in Berlin. Not surprisingly, speaker language background (i.e., monolingual vs. multilingual) had an influence on the realization of /ç/ as [ç], [ɕ], or [∫], with speakers from multilingual backgrounds more likely to produce fronter variants of /ç/. However, their results also showed that even monolingual, monoethnic Germans tended to merge this contrast and that if a speaker identified as somebody from Kreuzberg (rather than from Berlin more generally), this was a predictor for coronalization; that is, a more local (Kreuzberg) identity was correlated with fronter realizations of /ç/. Jannedy and Weirich (2017) conducted a comprehensive spectral analysis of German /ç/ and /∫/ by quantifying the acoustic differences between the two fricatives in three speaker groups with varying contrast realizations. The groups consisted of speakers from two cities in Germany, Berlin and Kiel, with the Berlin speakers additionally differing between KD speakers and regional SG speakers. While differences between the fricatives were apparent from visual inspection of the spectra and were perceivable in a categorization test (but to a much lesser extent or even only at the chance level in the KD speakers), the acoustic analysis revealed that spectrally, the fricatives were very similar in all varieties and the spectral moments (Center of Gravity [COG], standard deviation [SD], kurtosis, skewness) only showed minute differences. Comparing the two speaker groups from Berlin, tendencies were apparent toward higher COG values for both fricatives in KD speakers. To quantify the acoustic contrast between the two German fricatives in all three varieties, Euclidean distances (EDs) in a multidimensional space (4D) were calculated between each [ç] and [∫] from a minimal pair produced by a speaker using (1) the spectral moments and (2) the first four discrete cosine transformation (DCT) coefficients. DCT decomposes the signal into a set of half-cycle cosine waves, whereby the resulting amplitudes of these cosine waves are the DCT coefficients (Watson & Harrington, 1999). While the ED using the spectral moments did not reflect the visual impressions of spectral shape differences between speaker groups, the ED using DCTs showed a significant difference between Kiel and Berlin KD and between Berlin SG and Berlin KD in terms of a smaller acoustic contrast between the fricatives in the KD speakers.

On the perception side, previous studies have shown that the spectral characteristics in the fricative productions of KD speakers are indeed perceivable and salient for listeners (Jannedy et al., 2011; Jannedy & Weirich, 2014c; Weirich et al., 2020). Jannedy and Weirich (2014c) examined in a perception experiment whether listeners categorize identical acoustic stimuli differently in the context of two different primes: the names of two neighborhoods of Berlin (multilingual Kreuzberg and Zehlendorf, a monolingual/affluent district) and a control condition with no additional information. The acoustic stimuli consisted of natural acoustic stimuli with synthetic fricatives synthesized along a continuum ranging from /ç/ to /ʃ/ as either Fichte /fɪçtə/ (“spruce”) or fischte /fɪʃtə/ (first person sg. “to fish”). They assumed that listeners infer social information and linguistic stereotypes based on the names of these neighborhoods and would identify the stimuli differently depending on these primes. Indeed, a differential categorization pattern depending on the primes was found, which additionally interacted with listener age: for older listeners, the crossover point from /ç/ to /∫/ was shifted toward /∫/ in the Kreuzberg condition and thus older listeners categorized most stimuli as /∫/ (/fɪʃtə/) in the context of Kreuzberg; however, for younger listeners, the crossover point from /ç/ to /∫/ was shifted toward /∫/ in the control condition and thus younger listeners rated most stimuli as /∫/ in the control condition with no added information. This points to a sound change in progress and the potential loss of the phoneme contrast between /ç/ and /ʃ/ in German.

Further perception work (Weirich et al., 2020) investigated the strength of associations between phonetic alternations and social attributes in the context of KD and German with a French accent. An Implicit Association Task (IAT) was run with participants categorizing written words as having a positive or negative valence and auditory stimuli containing pronunciation variations of /ç/ as canonical [ç] (labeled Standard German) or non-canonical [ɕ] (labeled Kiezdeutsch). In a second version of the test, identical auditory stimuli were used but the label Kiezdeutsch was changed to French Accent. As expected, results showed faster reaction times when negative categories and non-canonical pronunciations or positive categories and canonical pronunciations were mapped to the same response key. In addition, older German listeners matched a supposed KD accent more readily with negatively connotated words compared to a supposed French accent, while younger German listeners seemed indifferent toward this variation. They did not react differently to the contextual primes which points to an ongoing societal normalization of speech features that were originally associated with one specific social group (i.e., multi-ethnic adolescents) and that are stigmatized especially by conservative forces. These features have now gained covert prestige and are adopted by others as their own, indexing social orientation toward multiethnicity, diversity, and urbanity. Young multiethnic listeners from Berlin associated the negative concepts more strongly with a supposed French accent compared to KD, pointing to a preference of in-group over out-group (Tajfel & Turner, 1986).

Altogether, these studies show that the phonetic realization of the fricative /ç/ in German carries social meaning, and its realization as [ɕ] or [ʃ] is strongly associated with the speaker group of KD.

Much less research has been conducted on other segmental characteristics of KD. Previous research has pointed to the realization of /ɔɪ/ as a feature of KD (Jannedy & Weirich, 2014a, 2014b; Weirich & Jannedy, 2013). The studies showed that for female speakers, the nucleus of /ɔɪ/ is realized as more centralized in KD speakers compared to SG speakers, with higher F2, particularly from the start to the mid part of the diphthong. For male KD speakers, F2 is also higher, not only at the start but throughout the diphthong. No effect was apparent for F1. Jannedy and Weirich (2014a) also looked at linguistic influences on diphthong centralization and found that while male KD speakers showed a raised F2 value irrespective of segmental environment, for female speakers, the centralization of the diphthong was enhanced by following and preceding obstruents. Syllable structure or sentence accent seems to have less of an effect on diphthong centralization. Ongoing investigations, also including the two other German diphthongs /aɪ/ and /aʊ/, point to the significance of /ɔɪ/-fronting in KD (Jannedy & Weirich, 2013, 2014a; Jannedy, Weirich, Mendelsohn & Schüppenhauer, 2019; Weirich et al., 2024).

KD is a linguistic conglomerate of various speech features constituting a specific speech style (Auer & Dirim, 2003; Wiese, 2012), predominantly used in informal peer-group settings by multilingual speakers from multicultural neighborhoods, and it is this style that is being deployed when speaking among each other in school yards or outside of formal social contexts. However, we have found that this is not always the case. In our previous work exploring the acoustic contrast between /ç/ and /ʃ/ (Jannedy & Weirich, 2017), we tried to elicit the greatest possible contrast between these two sounds by eliciting minimal pairs. According to Labov’s (1972b) taxonomy, minimal pairs should bring out a contrast if speakers have a contrast. A group of speakers did not produce a contrast and even appeared puzzled to see apparently two different ways of orthographically capturing the same sound string. Thus, KD to some is not a performance, but a primary linguistic resource used across different situational and functional settings, independent of addressee. Note that we have exemplified this with minimal pairs, but have also observed the use of other KD features (lack of agreement, usage of bare nouns, etc.) in students’ interactions with their teachers and in the laboratory.

Work on social meaning is predicated on the premise that language possesses different strata of meaning that transcend the confines of lexical constituents. Linguistic choices then, in essence, reflect the multifaceted dimensions of human existence, encompassing identities, affiliations, attitudes, stances, and ideological orientations. Through the examination of these nuanced linguistic choices and variants, and a detailed exploration of phonetic subtleties, we strive to unveil the social meaning entailed in fine phonetic detail in communicative processes. However, it is paramount to acknowledge that the assumed social meaning of fine phonetic detail can only be validated by listeners who are capable of decoding, interpreting, and ascribing significance to the meaning, that is, for whom a variant is enregistered.

In this work, we take this approach and test whether the pronunciation variant of /ɔɪ/ is salient also for perception and used by listeners to infer social information about the speaker.

1.2 Voice quality as a social marker?

As described above, the term voice quality is often used in a narrow sense to refer to differences in phonation resulting from changes in laryngeal settings, and in this paper, we will use the term voice quality in this sense. Although many different non-modal voice qualities can be described (Laver, 1980), two general categories are most often referred to in the literature: breathy voice and creaky voice (Garellek, 2019; Keating & Esposito, 2006). Breathy voice is generally produced with increased glottal opening resulting in additional aspiration noise; creaky voice, however, is generally produced with increased glottal constriction and low and irregular F0 (Garellek, 2019; see Keating et al., 2015 for an overview of the different acoustic manifestations of creaky voice).

In some languages, phonation is a contrastive feature that can signal a change in meaning. This is the case, for example, in Jalapa Mazatec, in which creaky voice quality produces a contrast to the same item produced with modal or breathy voice (Garellek & Keating, 2011). Voice quality is not generally a contrastive phonological feature of Indo–European languages (Keating et al., 2023), although changes in phonation can serve as a cue to segmental contrasts (see e.g., Penney et al., 2018). However, non-modal voice qualities may be exploited for prosodic and sociolinguistic purposes. For example, creaky voice has been shown to mark phrase/utterance finality in multiple languages, such as English (Garellek, 2015; Henton & Bladon, 1988; Kreiman, 1982), Estonian (Aare et al., 2018), Finnish (Ogden, 2001), Swedish (Carlson et al., 2005), and German (Köser, 2014; Peters, 2003).

Many studies have shown that voice quality differences can serve as a social cue and may index elements of a speaker’s identity. For example, in some varieties of British English, female speakers make more use of breathy voice (Henton & Bladon, 1985; Stuart-Smith, 1999a), whereas creaky voice is associated with middle-class male speakers (Esling, 1978; Henton & Bladon, 1988) and working-class males tend to use harsh or whispery voice (Esling, 1978; Stuart-Smith, 1999a). It has also been suggested that female speech may be breathier than male speech in general, due to incomplete glottal closure in females (Södersten & Lindestad, 1990). However, recent studies suggest that there may be a prevalence for more creaky voice in female speech, particularly in the case of younger women in American English (Abdelli-Beruh et al., 2014; Podesva, 2013; Wolk et al., 2012; Yuasa, 2010), although this may also reflect researcher bias, as female speakers of American English tend to be the primary target of such studies (Dallaston & Docherty, 2020). Aside from indexing gender, voice quality can signal other elements of a speaker’s personality. For example, in a study on Chicano gang members, creaky voice (along with visual cues, such as the length of eyeliner) was found to be associated with members projecting more “hardcore” personas (Mendoza-Denton, 2011).

Integrating differences in voice quality may also be a way for speakers from migrant backgrounds to express their ethnolinguistic repertoires or identities that reference their ethnocultural heritage and divergence from the mainstream (Clyne et al., 2001), and differences in voice quality have been identified between speakers of standard varieties and (multi)ethnolectal speakers in a number of varieties of English. For example, Newman and Wu (2011) found that American English speakers of Asian (Chinese and Korean) descent produced a breathier voice quality (among other features) than speakers of other non-Asian backgrounds. In New Zealand English, speakers of Māori descent have higher F0 than Pākehā speakers (i.e., those of European descent), and this is considered to index speakers’ ethnic identities (Szakay, 2006; Szakay & King, 2018). In addition to pitch differences, Szakay (2012) found that Māori speakers also produced a creakier voice quality than Pākehā speakers. In something of a contradiction, a subsequent study by Szakay and King (2018) suggested that speakers of Māori descent may in fact use less creaky voice than those of Pākehā descent. The conflicting results between these two studies are likely due to the use of different methodologies: in the former (Szakay, 2012), the analysis of vowel quality was based on a measure of spectral tilt (H1–H2) taken at the midpoint of vowels. The latter study (Szakay & King, 2018), however, focused specifically on creaky voice, and the prevalence of creak was determined based on F0 using the AntiMode method, which assumes a bimodal distribution of F0 in speakers who produce modal voice and creaky voice, with any F0 measure below a calculated antimode taken to be an instance of creak (Dallaston & Docherty, 2020; Dorreen, 2017; White et al., 2022). Regardless of the inconsistencies between these two studies, it is clear that there are differences of voice quality between speakers of different ethnic backgrounds in this variety of English.

In Multicultural London English (MLE), a multiethnolect spoken in linguistically diverse areas of London, Szakay and Torgersen (2015) found that male speakers have a breathier voice quality than those who live outside of London and have an Anglo background. They initially also found that female speakers of MLE have a creakier voice quality than their non-London counterparts of Anglo background, though this result was impacted by issues of tracking F0 at low frequencies, and a later re-analysis determined that creaky voice was rather a marker of outer London Anglo speech (Szakay & Torgersen, 2019). More recently in Australian English, Penney and Cox (2021) have identified increased breathiness in monosyllabic CV words in speakers of Lebanese background compared to mainstream Australian English speakers. Loakes and Gregory (2022) have also identified voice quality differences between Indigenous speakers of Australian Aboriginal English and mainstream Australian English, with lower F0 and a creakier voice quality produced by the Australian Aboriginal English speakers.

These results, taken together, demonstrate that voice quality differences may be employed by speakers of (multi) ethnolectal varieties in various language environments. However, voice quality remains understudied in work on multiethnolectal variation, particularly in non-English speaking contexts. Anecdotal observations suggest that higher F0 and increased breathiness may be present in the speech of KD speakers. Therefore, in this study, we present an acoustic comparison of voice quality in adolescent KD speakers and SG speakers from Berlin.

1.3 Hypotheses and structure of the paper

In parts 2 and 3 of this paper, we will present our analysis on voice quality in KD speakers, which was conducted on two sets of production data taken from a previously collected and annotated corpus: data from a conversational task and from a reading task. Based on the previous findings regarding voice quality in similar multiethnolectal groups, we may hypothesize that KD speakers will be found to produce a breathier voice quality than SG speakers. Knowing that voice quality can signal various social stances, we examine whether breathy voice is associated with KD speakers and interpreted to signal group affiliation. Therefore, in parts 5 and 6, a perception test is described which investigates the potential impact of three phonetic cues (coronalization of /ç/, /ɔɪ/-fronting, breathy voice quality) on the attribution of speaker background (i.e., KD speaker or not). We hypothesize that the different cues will have different weights regarding their indexing values: while the association between /ç/ and KD is strong and its role in perception has been shown before, the relative salience of /ɔɪ/ and differences in voice quality remain to be seen. These cues could be factors that add to an existing bias but might not be sufficient to index a speaker’s background on their own.

2 Methods: production

2.1 Data and speakers

The production data analyzed here were extracted from a database of annotated audio-recordings of speech maintained by ZAS Berlin, which contains recordings of male and female KD speakers from multicultural neighborhoods in Berlin that are highly associated with the multiethnolect, and recordings of male and female SG speakers from other parts of the city. To ensure that speakers were balanced for age between KD and SG—and consequently that any observed differences were not due to age grading or differences in processes of laryngeal development—we included only data from adolescent speakers aged 14–17 years, as the number of speakers in each of the groups was most comparable in this age range. Data for one male speaker within this age range were excluded as his mean F0 was substantially higher than all of the other males (256 Hz compared to 99–133 Hz) indicating he had not yet experienced the lowering of F0 associated with voice break in puberty.

Data from three sets of speakers are included in this analysis. Two of the groups comprised KD speakers from the neighborhoods of Wedding, Kreuzberg, and Neukölln: one group of KD speakers was recorded as they engaged in spontaneous conversation with a research assistant in an interview task, n = 23, female: 14; male: 5; mean age: 15.5 years (SD = 0.81). Another group of KD speakers was recorded as they engaged in a scripted sentence-reading task in which a set of target words designed to elicit diphthongs was produced in the carrier phrase Ich habe X gesagt (I said X), n = 22, female: 11; male: 11; mean age = 14.4 years (SD = 0.58). All of the speakers in the KD group were born in Germany and spoke German, but the majority of them reported additional heritage languages spoken with various levels of proficiency (five speakers reported speaking only German). Consistent with previous descriptions (Auer & Dirim, 2003; Jannedy & Weirich, 2014c; Selting & Kern, 2011), the language backgrounds of the KD speakers encompassed a range of languages, with the most frequent being Turkish, Arabic, and Kurdish, or a combination of these, and less frequently Farsi, Azerbaijani, Serbian, Croatian, Macedonian, Romanian and Russian. In addition to the KD speakers, data from a third group comprising SG speakers from the neighborhood of Charlottenburg were analyzed. All speakers in this group were born in Germany and had an exclusively German language background. Speakers from this group took part in both a spontaneous conversation and a sentence-reading task, although not all speakers took part in the sentence-reading task, Conversation task: n = 13, female: 8; male: 5; mean age: 15.0 years (SD = 0.65); Reading task: n = 11, female: 6; male: 5; mean age = 15.2 years (SD = 0.38). Table 1 summarizes the participants in each task and the number of segments included in the analyses.

Table 1.

Summary of Speakers and Number of Segments Included in Analyses According to Task.

	Conversation task			Reading task
	Speakers	Mean age (SD)	Segments	Speakers	Mean age (SD)	Segments
KD female	14	15.5 (0.97)	1,046	11	14.1 (0.29)	212
KD male	5	15.5 (0.50)	768	11	14.7 (0.62)	213
SG female	8	14.6 (0.482)	637	6	15.0 (0)	511
SG male	5	15.5 (0.50)	567	5	15.4 (0.49)	431

The speakers were recorded either in a quiet room of their school or in the laboratory at ZAS. The recording sessions were conducted by the third author and a research assistant trained in sociolinguistic interviews. The interviewers were known to the speakers prior to the recording sessions through casual contact in youth centers and their school. Participants were under the impression that their voices were being recorded for research to improve speech technology, which they were excited to contribute to. They were familiarized with the situation beforehand and the atmosphere of the sessions was relaxed, with snacks and drinks provided. Recordings were made with a Sennheiser ME64 microphone to a Tascam DR-05 recorder with a sampling rate of 48 kHz and 16 Bit resolution. All data were orthographically transcribed and initially segmented using WebMAUS (Kisler et al., 2017), and all segment boundaries for the diphthongs /ɔɪ/, /aɪ/, and /aʊ/ were hand-corrected. The correction of diphthong segment boundaries was carried out as part of another, unrelated study (Weirich et al., 2024). For practical reasons, we chose to limit this analysis to these diphthong segments so that we could be confident in the accuracy of our segmentation, though we acknowledge that voice quality information can be carried by all vowels (and indeed any segments that are phonetically voiced).

2.2 Acoustic analysis

We extracted estimates of F0, H1*–H2*, and Harmonics-to-Noise ratio (HNR) using VoiceSauce (Shue et al., 2011). For each of these three acoustic measures, values were averaged across each diphthong segment. F0 was calculated using the STRAIGHT algorithm (Kawahara et al., 1999). There are a number of acoustic measures that have been used to describe differences in voice quality, including various measures of spectral tilt, noise, periodicity, and intensity (Gordon & Ladefoged, 2001; Keating et al., 2023). Of these, the most frequently used measurements, at least in recent phonetic research on voice quality, are those which quantify spectral tilt: how sharply harmonic amplitude drops off at higher frequencies (Keating et al., 2023). H1*–H2* is a measure of spectral tilt that calculates the difference in amplitude between the first harmonic (H1) and the second harmonic (H2), with the application of an algorithm to correct for the effect of formant frequencies (as indicated by the asterisks), which may increase the amplitude of harmonics in the vicinity (Iseli et al., 2007). As a measure of spectral tilt, higher values of H1*–H2* are correlated with increased glottal opening (and hence increased breathiness) and lower values of H1*–H2* are correlated with increased glottal constriction (and hence increased creakiness) (Garellek, 2019; Hillenbrand et al., 1994; Holmberg et al., 1995; Keating et al., 2023).

While higher values of H1*–H2* would generally suggest a breathier voice quality, it can be difficult to determine whether higher values are indeed due to a breathier voice quality compared to a more modal voice quality, or whether they simply represent modal voice quality and the lower values of H1*–H2* indicate a creakier voice quality. However, breathy voice also exhibits lower values of HNR relative to modal voice due to the presence of aspiration. Therefore, following Garellek (2019), we measured HNR in addition to H1*–H2*: higher values of H1*–H2* together with lower values of HNR would provide converging evidence for increased breathiness. HNR was measured in the band below 500 Hz (Garellek, 2012). Importantly, H1*–H2* and HNR have been shown to be among the most informative acoustic measures of phonation differences (Keating et al., 2023) that together capture spectral tilt and noise, which are the important dimensions of voice quality in a psychoacoustic model of voice that links speech production to perception through acoustics (Garellek, 2019; Garellek et al., 2016; Kreiman et al., 2014, 2021).

Prior to modeling, we excluded all very short and long diphthongs with durations of less than 50 or greater than 300 ms. Word initial vowels are generally marked with glottal onsets in SG (Kohler, 1990), which could lead to lower H1*–H2* values in the following vowel. As it was not clear to what extent this would also be the case in the KD speakers, we visually explored whether excluding items that occurred in word initial position would cause a change in the overall patterns visible in the data. Exclusion of items with word initial vowels did not alter the general patterns observed, so word initial items were ultimately retained to increase the overall number of items analyzed. For the conversation task data, this resulted in 3,454 (KD: 2,250; SG: 1,204) items remaining for analysis. For the sentence-reading task data, this resulted in 1,367 (KD: 425; SG: 942) items remaining for analysis.

2.3 Statistical analysis

Linear mixed-effects regression models were constructed using the lme4 (Bates et al., 2015) and lmerTest (Kuznetsova, et al., 2017) packages in R (R Core Team, 2020). Separate analyses were conducted for the conversation task data and for the reading task data; recall that only one group of multiethnolectal speakers took part in each of these tasks. For both of the tasks, separate models were fit with each of the three acoustic measures of interest included as the dependent variable (i.e., separate models were fit for F0, H1*–H2*, and HNR for both of the tasks). Fixed factors in all models were group (KD vs. SG) and gender (female vs. male) and their interaction. Fixed factors were treatment-coded for ease of interpretation with the reference levels of SG for group and female for gender. Random intercepts were included for speaker and for vowel. Note that we are interested in global measures of voice quality rather than whether differences in voice quality are present between different vowels, hence vowel was not included as a fixed factor in the models. The code for these models was as follows: lmer (F0/H1*–H2*/HNR ~ group * gender + [1|speaker] + [1|vowel]). Note that the model analyzing F0 in the conversation task data did not converge with this random structure, so random intercepts for vowel were removed from this model. Plots in Section 3 below were generated using the ggplot2 package (Wickham, 2016) in R (R Core Team, 2020).

3 Results: production

3.1 Analysis of F0

3.1.1 Conversation task data

Figure 1 illustrates the raw F0 (Hz) estimates for the KD and SG groups according to gender for the conversation task data. As can be seen, KD speakers (both female and male) exhibit somewhat higher F0 than their SG counterparts. The results of the linear mixed-effects model are shown in Table 2. Unsurprisingly, we found a significant effect of gender, with males producing lower F0 than females. There was no significant effect of group nor its interaction with gender.

Figure 1.

Mean F0 values (Hz) in conversation task data according to group (left, SG; right, KD) and gender (left panel, female; right panel, male).

Table 2.

Summary of Linear Mixed-Effects Model for Effects of Group (Reference Level: SG) and Gender (Reference Level: Female) on F0 in Conversation Task Data.

	Estimate	SE	df	t	p
(Intercept)	203.64	8.432	27.586	24.152	<.0001
GroupKD	12.88	10.620	28.000	1.212	.235
GenderM	−93.235	13.547	27.198	−6.882	<.0001
GroupKD: GenderM	1.846	18.370	27.268	0.100	.921

3.1.2. Reading task data

Figure 2 illustrates the raw F0 (Hz) estimates for both groups according to gender for the sentence-reading task data. The results of the linear mixed-effects model are shown in Table 3. Again, unsurprisingly, there was a significant effect of gender, with lower values produced by the male speakers. Although higher F0 values can be observed in the KD male speakers, there was no significant effect for group nor for the interaction.

Figure 2.

Mean F0 values (Hz) in reading task data according to group (left, SG; right, KD) and gender (left panel, female; right panel, male).

Table 3.

Summary of Linear Mixed-Effects Model for Effects of Group (Reference Level: SG) and Gender (Reference Level: Female) on F0 in Reading Task Data.

	Estimate	SE	df	t	p
(Intercept)	215.902	10.913	29.120	19.784	<.0001
GroupKD	3.457	13.467	28.509	0.257	.799
GenderM	−103.366	16.011	28.107	−6.456	<.0001
GroupKD: GenderM	27.538	19.640	28.439	1.402	.172

3.2 Analysis of H1–H2

3.2.1 Conversation task data

Figure 3 illustrates the H1*–H2* estimates for both groups according to gender for the conversation task data and shows that in both female and male speakers, higher H1*–H2* values are found for the KD speakers. This supports the hypothesis of a breathier voice quality in the KD group, although we note that as mentioned above, it is also possible that this represents a more constricted (i.e., creakier) voice quality in the SG group, and less constriction in the KD speakers, rather than breathier voice per se. The results of the linear mixed effects model are shown in Table 4. There were significant effects of gender, with overall higher values in the females and group.

Figure 3.

Mean H1*–H2* values (dB) in conversation task data according to group (left, SG; right, KD) and gender (left panel, female; right panel, male).

Table 4.

Summary of Linear Mixed-Effects Model for Effects of Group (Reference Level: SG) and Gender (Reference Level: Female) on H1*–H2* in Conversation Task Data.

	Estimate	SE	df	t	p
(Intercept)	1.085	0.7398	28.5704	1.467	.153
GroupKD	5.2450	0.921	27.8565	5.696	<.0001
GenderM	−5.7695	1.169	26.6551	−4.936	<.0001
GroupKD: GenderM	1.1189	1.5852	26.6816	0.706	.486

3.2.2 Reading task data

Figure 4 illustrates the H1*–H2* estimates for both groups according to gender for the sentence-reading task data. As was the case in the conversation task data, higher H1*–H2* values were found for the KD speakers for both female and male speakers, indicating a breathier (or at least a less constricted) voice quality compared to the SG group. The results of the linear mixed-effects model are shown in Table 5. There was a significant effect of gender, and also a significant effect for group.

Figure 4.

Mean H1*–H2* values (dB) in reading task data according to group (left, SG; right, KD) and gender (left panel, female; right panel, male).

Table 5.

Summary of Linear Mixed-Effects Model for Effects of Group (Reference Level: SG) and Gender (Reference Level: Female) on H1*–H2* in Reading Task Data.

	Estimate	SE	df	t	p
(Intercept)	−0.03765	1.11401	29.55608	−0.034	.97326
GroupKD	8.60427	1.33944	28.29134	6.424	<.0001
GenderM	−4.58578	1.59026	27.73932	−2.884	.00751
GroupKD: GenderM	−3.01854	1.95284	28.19077	−1.546	.13333

3.3 Analysis of HNR

3.3.1 Conversation task data

Figure 5 illustrates the HNR estimates for both groups according to gender for the conversation task data. Lower values of HNR are visible in the KD group, particularly in the male speakers. The results of the linear mixed-effects model are shown in Table 6. There was a significant effect of group, with lower values in the KD group. Paired with the higher H1*–H2* reported above, this points toward a breathier voice quality in the KD speakers. There was no significant effect for gender nor for the interaction.

Figure 5.

Mean HNR values (dB) in conversation task data according to group (left, SG; right, KD) and gender (left panel, female; right panel, male).

Table 6.

Summary of Linear Mixed-Effects Model for Effects of Group (Reference Level: SG) and Gender (Reference Level: Female) on HNR in Conversation Task Data.

	Estimate	SE	df	t	p
(Intercept)	43.658	3.5119	20.4733	12.431	<.0001
GroupKD	−9.8669	3.8657	28.1729	−2.552	.0164
GenderM	−8.4365	4.9528	27.7584	−1.703	.0997
GroupKD: GenderM	−0.4732	6.7122	27.7938	−0.070	.9443

3.3.2 Reading task data

Figure 6 illustrates the HNR estimates for both groups according to gender for the sentence-reading task data. Lower values are visible for the KD speakers in both females and male speakers. The results of the linear mixed-effects model are shown in Table 7. There were significant differences for gender, with lower values overall in the male speakers, and for group, with KD speakers again exhibiting lower HNR values than the SG speakers.

Figure 6.

Mean HNR values (dB) in reading task data according to group (left, SG; right, KD) and gender (left panel, female; right panel, male).

Table 7.

Summary of Linear Mixed-Effects Model for Effects of Group (Reference Level: SG) and Gender (Reference Level: Female) on HNR in Reading Task Data.

	Estimate	SE	df	t	p
(Intercept)	52.199	2.516	27.229	20.746	<.0001
GroupKD	−20.050	3.137	27.653	−6.392	<.0001
GenderM	−17.938	3.711	26.727	−4.834	<.0001
GroupKD: GenderM	8.714	4.572	27.520	1.906	.0671

3.4 H1–H2 and HNR in combination

The results in Sections 3.2 and 3.3 above suggest that participants in the KD group may produce a breathier voice quality compared to those in the SG group. In Section 3.2, higher values of H1*–H2* were observed in the KD group, which is indicative of increased glottal opening as would be expected in breathy voice. In Section 3.3, lower values of HNR were observed in the KD group, which is consistent with increased noise and which is also to be expected in breathy voice. These results were observed in both the conservational speech and in the reading task.

Figures 7 (conversational task) and 8 (reading task) illustrate these results in a combined manner for each individual item, with HNR values shown on the vertical axis and H1*–H2* values shown on the horizontal axis. The middle 50% of all items per category are represented by the ellipses. Overall, in all cases, the KD group exhibits values that are further to the right (higher H1*–H2*) and lower (lower HNR) than the SG group, confirming increased breathiness in this group. This is evident in both of the tasks and across both genders, although the difference appears to be slightly weaker in the females in the conversation data (Figure 7).

Figure 7.

HNR (vertical axis) and H1*–H2* (horizontal axis) values for each item in the conversational task data according to group (red = SG; blue = KD) and gender (left panel, female; right panel, male). Ellipses represent the center 50% of items per category.

4 Interim discussion

These results provide complementary evidence that KD speakers produce a breathier voice quality than their SG speaking peers. In both the conversation and reading task data, H1*–H2* values were significantly higher in the KD speakers compared to the SG speakers, which is indicative of breathy voice. Correspondingly, significantly lower values for HNR were also found in the KD speakers, indicating more noise in their speech and also pointing toward a breathier voice quality (Garellek, 2019). Figures 7 and 8 also illustrate that at the level of individual items, the KD group appears to be breathier than the SG group. Taken together, these results provide converging evidence that the KD speakers in our data have a breathier voice quality than their SG counterparts, with the results from both cues pointing in the same direction. It should also be pointed out that the two tasks could potentially be considered as representing different communicative situations, with a more controlled reading task on one hand and a spontaneous conversation task on the other, and as such more natural speech might be expected in the conversations compared to the reading task. That we have found similar results across the different tasks (and across different groups of KD speakers) suggests either that both tasks were similar in perceived formality by our speakers or that increased breathy voice in KD speakers is a robust effect that is evident across different speaker groups in different communicative situations. We leave it to future research to explore this question in more detail.

Figure 8.

HNR (vertical axis) and H1*–H2* (horizontal axis) values for each item in the reading task data according to group (red = SG; blue = KD) and gender (left panel, female; right panel, male). Ellipses represent the center 50% of items per category.

We also observed a tendency for higher F0 values in the KD speakers, particularly in the males, though no significant effect of group was found. Of course, significant effects were found for F0 with regard to gender, with higher F0 values produced by the female speakers in both groups, representing the tendency for males to have longer and thicker vocal folds compared to females. Gender differences were also found for H1*–H2* and for HNR, with higher values found for females compared to males (although this was not significant for HNR in the reading task). While the higher H1*–H2* values may suggest that females are breathier than males in general—a not uncommon finding (Henton & Bladon, 1985; Södersten & Lindestad, 1990; Stuart-Smith, 1999a)—the combination of higher spectral tilt and higher HNR may rather suggest that the females are more modal than the males, with the males overall producing a more constricted voice quality compared to the females. However, a comparison of gender differences in voice quality is beyond the scope of this paper, and in any case needs to be approached with caution (Simpson, 2012). Nevertheless, the results show that, within each gender, the KD speakers are breathier than the SG speakers.

Having established that there appear to be differences in voice quality between these two groups of speakers, the question that needs to be asked is whether listeners are sensitive to these differences, and whether such a difference in voice quality carries social meaning. Listeners can of course perceive differences in voice quality; even in languages where voice quality is not used contrastively, differences in voice quality are known to influence listeners’ perceptions of a speaker’s characteristics or their affective state (e.g., Anderson et al., 2014; Gobl & Ní Chasaide, 2003; Yuasa, 2010). Yet the degree to which listeners associate a particular voice quality with a (multi)ethnolectal group of speakers is not well understood. In American English, Newman and Wu (2011) speculated that listeners may rely on a breathier voice quality as one feature in identifying Asian Americans from those with other ethnicities; however, this notion was not tested directly. In New Zealand English, Szakay (2012) found that listeners used voice quality differences (combined with rhythmic and intonational cues) to identify whether speakers sounded more Māori or more Pākehā, with breathier voice qualities being perceived as more Pākehā sounding and creakier voices as more Māori sounding. This effect was strongest for listeners who were highly integrated within Māori communities. Whether voice quality also indexes group affiliation for listeners from outside of a group, and whether voice quality on its own is sufficient for this, remains to be investigated.

While non-modal voice quality features such as creak may signal the onset of stressed syllables or phrase/utterance finality, breathy voice is not known to be employed for any structural purpose in German. If at all, it might be considered as signaling femininity or intimacy, as in other languages spoken in Europe, such as Dutch and Spanish (Gobl & Ní Chasaide, 2003; Mendoza et al., 1996; Sulter & Peters, 1996; Van Borsel et al., 2009), though to our knowledge, this has not been empirically examined specifically for German. Such a perception would, however, appear to be at odds with the stereotypical image portrayed by many KD speakers, who tend to project a tough, inner city image (Bahlo & Lohse, 2021). In addition, it is not clear whether or to what extent variation in voice quality is salient to listeners, and whether it forms a part of their perception of specific social groups, separate from more salient variation in segmental features. That is, it is not clear whether there is a link between the phonetic form of producing a breathier voice quality and the social characteristic of being a KD speaker (Johnson et al., 1999), in the absence of a segmental marker such as coronalization of /ç/, which has been shown to be robustly linked to listeners’ perception of KD speakers.

Therefore, in the following sections, we report on a perception test designed to examine the social meaning of breathy voice with regard to KD, and whether this is perceived relative to other segmental cues. We examined the effect of breathy voice on the perception of a speaker’s background both on its own and in combination with two segmental cues to KD: coronalization of /ç/, and /ɔɪ/-fronting. Coronalization has been shown to be a salient marker of KD (Jannedy & Weirich, 2014c; Weirich et al., 2020), whereas evidence for /ɔɪ/-fronting has been found in production studies (Jannedy & Weirich, 2013, 2014a, 2014b) but it is not yet clear to what extent listeners associate this with KD. If a breathy voice quality has been enregistered as a feature of KD, we would expect listeners to be more likely to rate someone as a KD speaker when hearing a breathier utterance than when hearing a modally voiced utterance. However, it is possible that voice quality on its own is not sufficient to shift listeners’ perception, but rather that its implementation in conjunction with other cues, such as the coronalization of /ç/, will enhance the perception of a KD speaker. Finally, it is also possible that differences in voice quality exist between KD and SG speakers, but that this phonetic feature has no perceptual relevance for listeners; that is, despite acoustic differences in production, voice quality may not (yet) be enregistered as a perceptually relevant cue to KD.

5 Methods: perception

5.1 Listeners

171 listeners (diverse: 5, non-binary: 1, female: 56, male: 98, no info: 11) took part in the online perception test. Participants who did not finish the test or said they did not know how KD sounds were excluded from the analysis resulting in 140 listeners (diverse: 2, non-binary: 1, female: 50, male: 87). Listener age ranged from 18 to 40 years with a mean age of 28.1. Participants varied in the time they had lived in Berlin from 1 to 39 years.

5.2 Stimuli

The stimuli used in the perception test consisted of the sentence Alle Leute sollten morgen die Lichter ausmachen (everyone should switch off the lights tomorrow). The target cues /ɔɪ/ and /ç/ were embedded in the disyllabic and accented carrier words Leute (people) and Lichter (lights). This sentence was chosen to include the relevant targets (diphthong and fricative) but no (known) additional cues that could influence a rating on the perceived background of the speaker.

A professional voice actor—from Berlin, with a multiethnolectal background and knowledge about the variety KD—was paid to produce the stimuli sentences several times, and was trained in varying his voice quality from modal to breathy voice. We acknowledge that using a voice actor necessarily entails that our stimulus items were not produced by an “authentic” speaker of KD. However, we felt that this was necessary to ensure a level of phonetic control over the variants of the relevant features and to ensure that other features not being examined remained consistent between the items, and to avoid including additional cues to the identity of the speaker. Moreover, as (at least some of) the features being tested appear to be below the level of awareness of speakers, it can be quite difficult to find speakers who can vary these convincingly. The sentence productions differed with regard to voice quality (modal vs. breathy voice) and the variants used for the diphthong and the fricative (supposedly coming from a KD speaker or an SG speaker). Out of all productions, one sentence was chosen for each voice quality condition (based on H1*–H2* measures) to be used as carrier sentences for the other stimuli with varying segmental cues. Based on our knowledge of the acoustic characteristics of the KD and SG variants, the most suitable productions of the two variants for each condition (KD and SG) were chosen and spliced into the respective carrier sentence (i.e., the modal and breathy conditions), both singly and in combination. Great care was taken to cut and splice at zero crossings to avoid auditory discontinuities. Thus, the final stimuli used for the perception test consisted of four sentences in the modal condition and four sentences in the breathy condition that varied only in the production of the target cues (1: both variants in SG, 2: KD diphthong and SG fricative, 3: SG diphthong and KD fricative, 4: both variants in KD; see also Table 8 below), while the rest of the sentence did not vary (within the modal and breathy conditions). Attention was also paid to the comparability of the segmental cues in terms of formants and COG between the modal and breathy conditions. While the variation of F0 over the utterance was very similar between the carrier sentences (see Figure 9), mean F0 varied to some extent (breathy mean F0: 106 Hz; modal mean F0: 161 Hz). To control for a possible effect of fundamental frequency on the ratings, mean F0 of all sentences was changed to 120 Hz using the Change Gender option in Praat (while the formant shift ratio was set to 1.0 resulting in no change of the formants) (Boersma & Weenik, 2022), with the final stimuli judged as natural sounding by members of our three laboratories. Figure 9 shows two stimuli in the breathy condition, the upper one with both the diphthong and the fricative in KD, the lower one with both the diphthong and the fricative in SG. While in the figure, small temporal variations between the target segments (marked by the frames) in the two conditions can be seen, the other parts of the sentence do not differ.

Table 8.

Acoustic Characteristics of All Eight Stimuli Sentences.

Stimulus	Voice quality	Diphthong	Fricative	COG (SD) (Hz)	F1 (Hz) 25%/75%	F2 (Hz) 25%/75%	H1–H2 (dB)
1m	Modal	SG-like	SG-like	5,225 (2,903)	515/529	1,024/1,219	2.97
2m	Modal	KD-like	SG-like	5,225 (2,903)	531/433	1,170/1,628	2.04
3m	Modal	SG-like	KD-like	6,514 (1,683)	515/529	1,024/1,219	2.93
4m	Modal	KD-like	KD-like	6,514 (1,683)	531/433	1,170/1,628	2.10
1b	Breathy	SG-like	SG-like	5,594 (2,858)	516/469	1,014/1,288	6.60
2b	Breathy	KD-like	SG-like	5,594 (2,858)	510/424	1,191/1,668	6.44
3b	Breathy	SG-like	KD-like	6,620 (1,887)	516/469	1,014/1,288	5.74
4b	Breathy	KD-like	KD-like	6,620 (1,887)	510/424	1,191/1,668	6.60

COG (SD) is the mean value from the target /ç/ fricatives; F1 and F2 are taken at the 25% and 75% points of the target /ɔɪ/ diphthongs; H1*–H2* is measured across the entire sentence (all voiced items).

Figure 9.

Oscillogram and spectrogram of the breathy stimulus Alle Leute sollten morgen die Lichter ausmachen (everyone should switch off the lights tomorrow) in two conditions (upper panel: both segments in KD version, lower panel: both segments in SG version). The diphthong /ɔɪ/ and the fricative /ç/ are marked by frames.

Several analyses were made to compare the spectral characteristics of diphthongs and fricatives in KD and SG, and Table 8 shows the acoustic parameters measured in the original files used to create the final stimuli for the perception test. While a clear difference between the fricatives in SG and KD in terms of COG and SD can be seen, with a higher COG and a lower SD in KD compared to SG (cf. stim1,2 vs. stim3,4), the variants differ only slightly between the modal and breathy conditions (cf. stim1,2m vs. stim1,2b and stim3,4m vs. stim3,4b). To describe and compare the diphthong characteristics, formants were measured at two timepoints (at 25% and 75% of the diphthong). As Table 8 and Figure 10 show, differences are apparent between KD and SG especially in F2 with higher values at both timepoints in KD compared to SG, reflecting as expected, the more fronted production in KD (cf. Jannedy & Weirich, 2014a). Comparing the F2 transitions, the KD version shows a much steeper rise, while the SD version is flatter (cf. Figure 8). Systematic differences in F1 are much smaller; only at the later time point does the KD diphthong have lower values than the SG diphthong. These differences hold for both breathy and modal conditions.

Figure 10.

Oscillogram and spectrogram of /lɔɪ/ (from “Leute”) in SG (left) and KD stimuli (modal condition).

Table 8 also includes mean H1*–H2* values across all voiced segments in the stimuli sentences and shows clear differences between the modal and the breathy stimuli, with higher values in the breathy stimuli.

5.3 Perception test

The perception test was run via an online platform (Easyfeedback.de), and the link was distributed through the authors’ social networks. Listeners were not paid for participation but were given the option to register for a prize draw of €50 worth of gift vouchers. To minimize the time and effort participants had to invest to take part in the perception test and to ensure participants’ decisions on the perceived background of the speaker were judged against their own past experiences rather than categorized with regard to other stimulus items, each listener was asked to rate just one of the eight stimuli. Each of the eight stimuli was rated by a different group of listeners, varying in number between 15 and 22 listeners. Listeners were asked how likely it was on a scale from 0 (not at all likely) to 5 (very likely) that the speaker of the presented stimulus was a KD speaker. In addition, social information was collected from the listeners regarding their age, gender, whether they were familiar with KD and the length of time they had lived in Berlin (in years).

5.4 Statistical analysis

As in the production study, linear regression models were used for the analysis of the perception data using the lme4 (Bates et al., 2015) and lmerTest (Kuznetsova et al., 2017) packages in R (R Core Team, 2020). A model was fit with the perception score as a dependent variable (with higher scores reflecting higher KD ratings). As factors, we entered the fricative variant (KD-like vs. SG-like), the diphthong variant (KD-like vs. SG-like), the voice quality condition (modal vs. breathy), and the age of the listener (ageCS: centered and scaled). Categorical factors were treatment-coded for ease of interpretation with the reference levels of modal for voice quality (vq), KD-like for diphthong (diph), and KD-like for fricative (fric). Means and standard deviations for mean listener age for each stimulus are provided in Table 9. Since age of listener and time lived in Berlin were positively correlated (r = .24, p < .01) and the variable time lived in Berlin was not homogeneously distributed across the rating groups, we focused on age of listener only as a potential effect of listener background. Similarly, listener gender (divided by diverse, non-binary, female, male, no info) was not distributed equally across the rating groups, so this was not included in the model. We tested the most complex model consistent with the experimental design including the four-way interaction term of the factors analyzed. The code for this model was as follows: lm(ratings ~ vq * diph * fric * age). Plots in Section 6 below were generated using the ggplot2 (Wickham, 2016), ggeffects (Lüdecke, 2018a), sjmisc (Lüdecke, 2018b), and sjPlot (Lüdecke, 2023) packages in r (R Core Team, 2020).

Table 9.

Means and Standard Deviations for Listener Age According to Stimulus.

Stimulus	Mean age	SD
1b	27.8	4.81
2b	28.4	6.98
3b	27.7	5.46
4b	27.8	4.84
1m	26.9	5.09
2m	28.5	6.11
3m	29.2	6.48
4m	28.3	4.89

6 Results: perception

Figure 11 illustrates the perception scores for each stimulus separated by voice quality (modal, breathy) and age (here separated into a categorical factor (below and above 0, i.e. the centered variable listener age). The mean ages of these groups are 24.2 (SD: 2.54) and 33.9 (SD 3.1) respectively. A separation between the ratings is apparent between the stimuli with a KD fricative (stimulus numbers 3 and 4) compared to the stimuli with a SG fricative (stimulus numbers 1 and 2) in all subgroups but to varying degrees. The clearest separation between all four stimuli is found in the younger listeners and the modal voice quality: here the expected stepwise rising of the ratings can be seen with the stimulus with only SG variants showing the lowest score and the stimulus with only KD variants showing the highest score. The stimuli with only one of the segments in KD version lie in between, with the stimulus with the KD fricative showing higher ratings than the stimulus with the KD diphthong. The smallest differences between the stimuli are found in the older listeners in the modal voice condition. Here, ratings in favour of KD are generally low, while the older listeners tend to rate the speaker as KD when the KD fricative is matched with a breathy voice quality. In contrast, the younger listeners rate the speaker as less KD like when the KD fricative is matched with a breathy voice quality. This points to a varying effect of voice quality on KD ratings depending on listener age when the KD fricative is contained in the stimulus.

Figure 11.

Results of the perception test according to age and voice quality conditions.

A linear model was run testing for effects of voice quality, diphthong variant, fricative variant, and age of listener on the ratings including all interaction terms. The results are presented in Table 10.

Table 10.

Summary of Linear Model for Effects of Voice Quality Condition, Fricative, and Diphthong Variant (Reference Level: VQ_modal, /ɔɪ/-KD, /ç/-KD) and Listener Age (Numerical Variable With 0 As Mean Age and Negative Values Presenting Younger Listeners) on Perception Ratings.

	Estimate	SE	t	p
(Intercept)	2.357	0.36	6.499	<.0001
Age	−0.848	0.4	−2.136	.0346
VQ_br	−0.697	0.48	−1.458	.1473
/ç/_SG	−1.283	0.51	−2.53	.0127
/ɔɪ/_SG	−0.376	0.52	−0.726	.4692
Age: VQ_br	1.391	0.54	2.573	.0113
Age:/ç/_SG	0.667	0.52	1.291	.199
Age:/ɔɪ/_SG	0.323	0.51	0.633	.5277
VQ_br:/ç/_SG	0.785	0.68	1.156	.2497
VQ_br:/ɔɪ/_SG	0.836	0.7	1.202	.2315
/ç/_SG:/ɔɪ/_SG	0.233	0.73	0.319	.7502
Age: VQ_br:/ç/_SG	−1.445	0.69	−2.102	.0376
Age: VQ_br:/ɔɪ/_SG	−0.708	0.72	−0.977	.3304
Age:/ç/_SG:/ɔɪ/_SG	−0.088	0.75	−0.118	.9064
VQ_br:/ç/_SG:/ɔɪ/_SG	−0.979	1	−0.979	.3296
Age: VQ_br:/ç/_SG:/ɔɪ/_SG	0.92	1.05	0.877	.382

For the reference levels (modal voice, both variants KD-like), we found an effect of age, with older listeners rating the stimuli in general less KD-like (Estimate −0.848, p < .05). We also found an effect of the fricative variant for the younger listeners and the modal voice, with lower KD ratings for the SG variant (Estimate: −1.28, p < .05). In addition, we found a significant two-way interaction between voice quality and age (p < .05) and a significant three-way interaction between voice quality, fricative variant, and age (p < .05). To illustrate the nature of these interactions, Figure 12 visualizes the effects. While for the standard variant of the fricative (right subplot), no effect of voice quality is apparent; for the KD fricative (left subplot), voice quality affects the ratings in interaction with listener age: younger listeners (negative values on the x-axis) rate stimuli with modal voice to sound more KD-like, whereas older listeners rate the stimuli with breathy voice quality to be more KD-like.

Figure 12.

Model plot visualizing the interaction term age (ageCS) * voice quality (vq) * fricative variant (KD/SG).

7 General discussion

In this study, we explored differences in voice quality between two varieties of urban German as spoken by adolescents in Berlin. We also tested the relative perceptual salience of the segmental alternations /ç/–[∫] and the fronting of /ɔɪ/, previously found to be characteristic in speech production in the German youth-style multiethnolect KD (Jannedy et al., 2011; Jannedy & Weirich, 2014a, 2014b, 2017), and of the breathy voice quality found for speakers of KD as described in this work. Results indicate a perceptual gradience for phonetic alternations detected in KD. The most widely observed, prevalent and obvious segmental alternation /ç/–[∫] that is strongly associated with KD (Jannedy & Weirich, 2014c; Weirich et al., 2020) was shown to be a highly salient and most reliable marker in our speech perception experiment. The second segmental alternation, the fronting of /ɔɪ/, also found to be a reliable marker of KD in speech production (Jannedy & Weirich, 2013; Weirich et al., 2024), seems to have been generally detectable especially in modal voice by the younger listeners (see Figure 12), but it was not reliably interpretable: association rates with KD failed to reach significance showing that this alternation is not associated and enregistered with KD. In other words, our results indicate that the fronting of /ɔɪ/ currently is an indicator (reliable difference in production but unnoticed) with the potential to eventually become a marker (reliable difference in production and connected to a social trait) to a wider group of listeners. The seemingly categorical distinction between indicator and marker seems somewhat problematic given that such categorization appears to be highly listener-specific. In fact, this process resembles that of phonologization where acoustic variation can give rise to new sound patterns or structures in the grammar. In the social domain, acoustic variation gives rise to patterns in social structure by means of enregisterment (Agha, 2003) “whereby distinct forms of speech come to be socially recognized (or enregistered) as indexical of speaker attributes by a population of language users” (Agha, 2005, p. 38). Thus, the process of enregisterment resembles that of sound change which slowly progresses through populations of speakers and hearers rather than constituting instantaneous switches.

As for the phonation difference, while we did not find an effect of voice quality overall, we found that for older listeners, the combination of breathy voice with the KD-like fricative variant resulted in a greater proportion of KD responses; however, for younger listeners, who we might expect to have increased experience with KD given it is a youth-style multiethnolect, the addition of breathy voice to stimuli containing the fricative marker resulted in a lower proportion of KD responses. That is, for younger listeners, who were overall more likely to identify the speaker as a KD speaker, breathy voice does not appear to be associated with KD and reduces the likelihood that listeners perceive a KD speaker even when they produce coronalized variants of the palatal fricative, which is otherwise a strong cue to a KD speaker. Why then did the addition of breathy voice result in more KD responses for the older listeners, who were less likely to rate the speaker as a KD speaker overall? Two interpretations here seem possible, though one appears to us more plausible: Perhaps older listeners are more sensitive to the phonetic features of KD, to the extent that they have internalized increased breathiness as a marker of the variety. This seems rather unlikely, given their lower sensitivity to the fricative and diphthong variants. Or perhaps older listeners perceived the voice quality difference in the stimuli, and realizing that this was not typical of their past experience with SG, concluded that this must be connected to KD. That is, rather than drawing on their past experience with KD, they were listening for difference from their own production/variety, and the items with both a salient fricative difference and a voice quality difference were the furthest from SG, and as such were rated as most likely KD. We tentatively suggest that this latter explanation is more likely. Hence, breathy voice in this multiethnolect appears to be merely an indicator which currently does not have associated social meaning, at least in young people.

The data presented suggest that segmental differences may be more easily and more reliably detected and learned when they go hand in hand with meaning differences. Such meaning differences may be phonological in nature, for example when a minute change in the articulatory parameter causes an abrupt change in the acoustic space and when a minute change in the acoustics causes an abrupt shift in the perceptual category (Stevens, 1972). Such quantal nature may also be applicable to the connection between the linguistic and the social, when minute changes in fine phonetic detail are linked to social meanings which then become interpretable to listeners by means of enregisterment. When an alternant is detected and enregistered (Agha, 2003) and as such becomes indexical of some social meaning (such as toughness, membership in a group of like-minded people, etc.), then the relationship between perception and categorization is characterized by a continuous process of forming learned associations through tokens of experiences, an operation which can be best captured by exemplar-based models (Johnson, 2005). That this learning encompasses a process of individual learning is supported by our findings from our IAT (Weirich et al., 2020), which showed that the same alternant means different things to different listener groups.

The failure for younger listeners to connect a breathier voice quality to KD although voice quality differences in production have been shown here and also between other social groups, for example in Australia and Great Britain (e.g., Loakes & Gregory, 2022; Penney & Cox, 2021; Szakay & Torgersen, 2015), raises the question of whether we have merely asked the wrong question in the perception experiment or whether a global feature, such as voice quality, in which the domain of application spans over larger stretches of speech rather than being localized to individual words, morphemes, or sounds, is too variable to be rigidly connected to specific speaker groups. However, F0 is also a global parameter and is often connected to social constructs, such as gender performance or authority. According to Ohala’s (1994) biological frequency code hypothesis and its interpretations (cf. Gussenhoven, 2002), a higher F0 is associated with smallness and deference while a lower F0 is associated with tallness and masculinity. In addition, a meta-study by Winter et al. (2021) found that mean F0 was generally lower when speakers conversed with an imaginary superior as compared to an imaginary friend. This fits well with former British Prime Minister Margret Thatcher striving for authority by lowering her voice (Beattie et al., 1982) as not to appear submissive and overly feminine. This is corroborated by Klofstad et al. (2015) who found that voters prefer leaders with lower-pitched voices because they are perceived as more competent and having greater integrity. Thus, F0 as a global parameter does lend itself to social meaning. Nevertheless, in our production study, F0 was not found to differ significantly between the KD and SG groups.

So either voice quality differences are found in multiethnolects but they do not have a social meaning from the perspective of the language user and from the perspective of the hearer and interpreter, or we must also entertain the thought that the stimuli used in the perception study lacked specific acoustic characteristics that would have been the necessary prerequisites for different ratings. For example, it may be possible that the increased breathiness in KD speakers is linked to hoarseness rather than breathy voice per se; this may explain the acoustic effects we found, as hoarse voice (or harsh whispery voice, Esling et al., 2019; Laver, 1980) can also contain a breathy component paired with additional supraglottal constriction. In such a case, perhaps additional acoustic features rather than just increased breathiness would have generated different perception results. The speaker who produced the stimuli was also older than the participants in the production study, thus the perceived speaker age may also have played a role. Another reason that comes to mind as to why voice quality differences went unnoticed in our perception study is that breathiness may become interpretable only in conjunction with other features such as specific requirements on rhythm or F0 (cf. Szakay, 2012). Finally, an alternative explanation may be that a breathier voice quality is a marker of KD, but that this is salient only to genuine in-group listeners, who we did not specifically target in our perception test. In other work (Weirich et al., 2020), we have shown that in-group listeners (i.e., KD speakers) rated the /ç/–[ʃ] alternation more positively when this was associated with KD rather than with French, whereas for older, monolingual German out-group listeners, identical stimuli received more positive ratings when the listeners believed they were listening to French speakers rather than KD speakers. This shows that the same alternation can be noticed and associated positively or negatively within the same, in this case, urban space. Or it can be ignored if there is no association with it. It is clear that more perception work is needed (for a similar observation, see Thomas, 2002) and the right questions ought to be asked to understand the social significance of non-modal voice in KD and other multiethnolects.

8 Conclusion

Beyond physiological aspects, voice quality is learned behavior, just as any other phonological or phonetic expression. The implementation of breathy voice in several languages by specific speaker groups points to a larger pattern that currently has not attracted sufficient attention for associations with these speaker groups and thus social meaning to emerge. In this paper, we showed that in speech production, a voice quality difference is apparent in KD speakers (as has been found in several other multiethnolectal varieties of English); however, this was not picked up on in perception by (younger) German listeners. We firmly believe that deriving conclusions on social meanings from speech production studies alone poses the danger of leaving the meaning of an alternation up to the researcher rather than the speech community in which it occurs and to which it belongs. Therefore, we strongly suggest corroborating speech production work through speech perception studies, especially when making claims about the social meaning of a variant. Such studies not only allow for assessing how noticeable a variant is to a group of listeners but also exactly how they interpret a variant. In other work (Jannedy & Weirich, 2014c; Weirich et al., 2020), we have shown that the interpretation of variant forms and their social meaning is not uniform within a speech community. Those who do not know the code may only notice a difference or deviance from the norm (in the sense of what normally occurs in their ambient environment), while those who can interpret the added layer of meaning may be affected in their interpretation by being a member of the in-group rather than an external initiated observer (who may care a lot or not care much). Only through rigorous testing in perception as it relates to gaining an understanding of what means what to whom, can insights be gained. For example, despite the great care we took planning and conducting the perception study described in this paper, the presented data cannot speak to whether the acoustic differences have a meaning to speakers of that variety (KD) and whether or not it serves to mark in-group behavior (cf. Tajfel & Turner, 1986), which may not have undergone detection by outsiders. In an information processing view of cognition, perceptual categories are learned and social categories become enregistered, and the relationship between perception and categorization is characterized by a continued process of forming phonological and social associations.

Footnotes

S.J. thanks Keith Johnson for having taught her the relevance of perception work in conjunction with production studies. The authors thank Norma Mendoza-Denton for advice on social theory and relevant literature. They also thank Associate Editor Molly Babel and two anonymous reviewers for their enormously insightful feedback and suggestions on earlier versions of this paper. All remaining faults are our own.

Correction (October 2024):

Figures 11 and 12 along with some textual errors have been updated in the article since its original publication.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This research was partially supported by a Short-Term Research Grant (grant no. 57588366) to J.P. from the Deutscher Akademischer Austauschdienst (DAAD,German Academic Exchange Service) and by the Deutsche Forschungsgemeinschaft (DFG,German Research Foundation)—SFB 1412,(grant no. 416591334,project C02).

ORCID iDs

Joshua Penney

Melanie Weirich

Stefanie Jannedy

References

Aare

Lippus

Włodarczak

Heldner

(2018, September 2–6). Creak in the respiratory cycle. In Proceedings of INTERSPEECH 2018, Hyderabad (pp. 1408–1412). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2018-2165

Abdelli-Beruh

N. B.

Wolk

Slavin

(2014). Prevalence of vocal fry in young adult male American English speakers. Journal of Voice, 28, 185–190.

Agha

(2003). The social life of cultural value. Language & Communication, 23(3), 231–273.

Agha

(2005). Voice, footing, enregisterment. Journal of Linguistic Anthropology, 15(1), 38–59.

Anderson

R. C.

Klofstad

C. A.

Mayew

W. J.

Venkatachalam

(2014). Vocal fry may undermine the success of young women in the labor market. PLOS ONE, 9(5), Article e97506.

Auer

Dirim

(2003). Socio-cultural orientation, urban youth styles and the spontaneous acquisition of Turkish by non-Turkish adolescents in Germany. In Androutsopoulos

Georgakopoulou

(Eds.), Discourse constructions of youth identities (pp. 223–246). Benjamins.

Bahlo

Lohse

(2021). Indexing Kiez: Zur Deethnisierung juventulektaler Stile. Lublin Studies in Modern Languages and Literature, 45(1), 35–49. https://doi.org/10.17951/lsmll.2021.45.1.35-49

Bates

Maechler

Bolker

Walker

(2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1–48.

Beattie

G. W.

Cutler

Pearson

(1982). Why is Mrs. Thatcher interrupted so often? Nature, 300, 744–747.

10.

Boersma

Weenik

(2022). Praat: Doing phonetics by computer [Version 6.2.12]. http://www.praat.org

11.

Carlson

Hirschberg

Swerts

(2005). Cues to upcoming Swedish prosodic boundaries: Subjective judgment studies and acoustic correlates. Speech Communication, 46, 326–333.

12.

Catford

J. C.

(1977). Fundamental problems in phonetics. Edinburgh University Press.

13.

Clyne

Eisikovits

Tollfree

(2001). Ethnic varieties of Australian English. In Blair

Collins

(Eds.), English in Australia (pp. 223–238). Benjamins.

14.

Coupland

(2007). Style: Language variation and identity. Cambridge University Press.

15.

Dallaston

Docherty

(2020). The quantitative prevalence of creaky voice (vocal fry) in varieties of English: A systematic review of the literature. PLOS ONE, 15(3), Article e0229960.

16.

Davidson

(2020). The versatility of creaky phonation: Segmental, prosodic, and sociolinguistic uses in the world’s languages. Wires Cognitive Science, 12, Article e1547. https://doi.org/10.1002/wcs.1547

17.

Dorreen

(2017). Fundamental frequency distributions of bilingual speakers in forensic speaker comparison [Unpublished Master thesis]. University of Canterbury.

18.

Eckert

(2012). Three waves of variation study: The emergence of meaning in the study of variation. Annual Review of Anthropology, 41, 87–100.

19.

Eckert

Labov

(2017). Phonetics, phonology and social meaning. Journal of Sociolinguistics, 21(4), 467–496.

20.

Esling

(1978). The identification of features of voice quality in social groups. Journal of the International Phonetic Association, 8, 18–23.

21.

Esling

Moisik

Benner

Crevier-Buchman

(2019). Voice quality: The laryngeal articulator model. Cambridge University Press.

22.

Fougeron

Keating

P. A.

(1997). Articulatory strengthening at edges of prosodic domains. Journal of the Acoustical Society of America, 101, 3728–3740.

23.

Garellek

(2012). The timing and sequencing of coarticulated non-modal phonation in English and White Hmong. Journal of Phonetics, 40, 152–161.

24.

Garellek

(2014). Voice quality strengthening and glottalization. Journal of Phonetics, 45, 106–113.

25.

Garellek

(2015). Perception of glottalization and phrase-final creak. Journal of the Acoustical Society of America, 137, 822–831.

26.

Garellek

(2019). The phonetics of voice. In Katz

W. F.

Assmann

P. F.

(Eds.), The Routledge handbook of phonetics (pp. 75–106). Routledge.

27.

Garellek

Keating

(2011). The acoustic consequences of phonation and tone interactions in Jalapa Mazatec. Journal of the International Phonetic Association, 41, 185–205.

28.

Garellek

Samlan

Gerratt

B. R.

Kreiman

(2016). Modeling the voice source in terms of spectral slopes. Journal of the Acoustical Society of America, 139, 1404–1410.

29.

Gobl

Ní Chasaide

(2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40, 189–212.

30.

Gordon

Ladefoged

(2001). Phonation types: A cross-linguistic overview. Journal of Phonetics, 29, 383–406.

31.

Gussenhoven

(2002, April 11–13). Intonation and interpretation: Phonetics and phonology. In Proceedings of Speech Prosody 2002, Aix-en-Provence, France (pp. 47–57). International Speech Communication Association. https://doi.org/10.21437/SpeechProsody.2002-7

32.

Hall

(1995). Lip service on the fantasy lines. In Hall

Bucholtz

(Eds.), Gender articulated: Language and the socially constructed self (pp. 183–286). Routledge.

33.

Henton

Bladon

(1988). Creak as a sociophonetic marker. In Hyman

(Eds.), Language, speech and mind: Studies in honor of Victoria A. Fromkin (pp. 3–29). Routledge.

34.

Henton

Bladon

(1985). Breathiness in normal female speech: Inefficiency versus desirability. Language and Communication, 5, 221–227.

35.

Hillenbrand

Cleveland

R. A.

Erickson

R. L.

(1994). Acoustic correlates of breathy vocal quality. Journal of Speech and Hearing Research, 37, 769–778.

36.

Holmberg

E. B.

Hillman

R. E.

Perkell

Guiod

Goldman

S. L.

(1995). Comparisons among aerodynamic, electroglottographic, and acoustic spectral measures of female voice. Journal of Speech, Language, and Hearing Research, 38, 1212–1223.

37.

Iseli

Shue

Y.-L.

Alwan

(2007). Age, sex, and vowel dependencies of acoustic measures related to the voice source. Journal of the Acoustical Society of America, 121, 2283–2295.

38.

Ito

(2003). The contribution of voice quality to politeness in Japanese. In Proceedings of Voice Quality: Functions, Analysis, Synthesis (VOQUAL 2003) (pp. 157–162). International Speech Communication Association.

39.

Jannedy

Weirich

(2013). /oy/ as an identity marker of Hood German in Berlin. Proceedings of Meetings on Acoustics, 19, 060096. https://doi.org/10.1121/1.4800693

40.

Jannedy

Weirich

(2014a, May 5–8). Linguistic influences on diphthong realization of /ɔɪ/ in Hood German. In Proceedings of the 10th International seminar on speech production (ISSP), Cologne, Germany (pp. 218–221). International Speech Communication Association.

41.

Jannedy

Weirich

(2014b, May 20–23). Some aspects on individual speaking style features in Hood German. In Proceedings of Speech Prosody 7, Dublin, Ireland (pp. 843–847). International Speech Communication Association. https://doi.org/10.21437/SpeechProsody.2014-158

42.

Jannedy

Weirich

(2014c). Sound change in an urban setting: Category instability of the palatal fricative in Berlin. Laboratory Phonology, 5(1), 91–122.

43.

Jannedy

Weirich

(2017, August 17–21). Spectral moments vs discrete cosine transformation coefficients: Evaluation of acoustic measures distinguishing two merging German fricatives. Journal of the Acoustical Society of America, 142(1), 395–405.

44.

Jannedy

Weirich

Brunner

(2011, August 17–21). The effect of inferences on the categorization of Berlin German fricatives. In Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong (pp. 962–965). International Phonetic Association.

45.

Jannedy

Weirich

Helmeke

(2015, August 10–14). Acoustic analyses of differences in [ç] and [ʃ] productions in Hood German. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow, Scotland. International Phonetic Association. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/proceedings.html

46.

Jannedy

Weirich

Mendelsohn

Schüppenhauer

(2019, September 25–27). The social meaning of diphthong fronting in Berlin German [Conference session]. Phonetik und Phonologie Tagung 15, Düsseldorf, Germany.

47.

Johnson

(2005). Decisions and mechanisms in exemplar-based phonology. Phonlab Annual Report, 1, 289–311. https://doi.org/10.5070/P77m49b843

48.

Johnson

(2006). Resonance in an exemplar-based lexicon: The emergence of social identity and phonology. Journal of Phonetics, 34, 485–499.

49.

Johnson

Strand

E. A.

D’Imperio

(1999). Auditory-visual integration of talker gender in vowel perception. Journal of Phonetics, 27, 359–384.

50.

Kawahara

Masuda-Katsuse

de Cheveigne

(1999). Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Communication, 27, 187–207.

51.

Keating

Esposito

(2006). Linguistic voice quality. UCLA Working Papers in Phonetics, 105, 85–91.

52.

Keating

Garellek

Kreiman

(2015, August 10–14). Acoustic properties of different kinds of creaky voice [Conference session]. Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow, Scotland.

53.

Keating

Kuang

Garellek

Esposito

C. M.

Khan

S. D.

(2023). A cross-language acoustic space for vocalic phonation distinctions. Language, 99(2), 351–389. https://doi.org/10.1353/lan.2023.a900090

54.

Kisler

Reichel

U. D.

Schiel

(2017). Multilingual processing of speech via web services. Computer, Speech and Language, 45, 326–347.

55.

Klofstad

C. A.

Anderson

R. C.

Nowicki

(2015). Perceptions of competence, strength, and age influence voters to select leaders with lower-pitched voices. PLOS ONE, 10(8), Article e0133779.

56.

Kohler

(1990). Illustrations of the IPA: German. Journal of the International Phonetic Association, 20, 48–50.

57.

Köser

(2014). Phrasen-finale Phonationsänderungen und ihre Rolle beim turn taking. In Barth-Weingarten

Szczepek Reed

(Eds.), Prosodie und Phonetik in der Interaktion/Prosody and phonetics in interaction (pp. 20–45). Verlag für Gesprächsforschung.

58.

Kreiman

(1982). Perception of sentence and paragraph boundaries in natural conversation. Journal of Phonetics, 10, 163–175.

59.

Kreiman

Gerratt

B. R.

Garellek

Samlan

Zhang

(2014). Toward a unified theory of voice production and perception. Loquens, 1, Article e009. https://doi.org/10.3989/loquens.2014.009

60.

Kreiman

Lee

Garellek

Samlan

Gerratt

B. R.

(2021). Validating a psychoacoustic model of voice quality. Journal of the Acoustical Society of America, 149, 457–465.

61.

Kuznetsova

Brockhoff

Christensen

(2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1–26.

62.

Labov

(1966). The social stratification of English in New York City. Center for Applied Linguistics.

63.

Labov

(1972a). Language in the inner city. University of Pennsylvania Press.

64.

Labov

(1972b). Some principles of linguistic methodology. Language in Society, 1, 97–120.

65.

Laver

(1980). The phonetic description of voice quality. Cambridge University Press.

66.

Laver

(1994). Principles of phonetics. Cambridge University Press.

67.

Lefkowitz

Sicoli

(2007, November 28). Creaky voice: Constructions of gender and authority in American English conversation [Paper presentation]. 106th Annual Meeting of the American Anthropological Association, Washington, DC, United States.

68.

Loakes

Gregory

(2022). Voice quality in Australian English. JASA Express Letters, 2, 085201.

69.

Lüdecke

(2018a). ggeffects: Tidy data frames of marginal effects from regression models. Journal of Open Source Software, 3(26), 772. https://doi.org/10.21105/joss.00772

70.

Lüdecke

(2018b). sjmisc: Data and variable transformation functions. Journal of Open Source Software, 3(26), 754. https://doi.org/10.21105/joss.00754Lüdecke

71.

Lüdecke

(2023). sjPlot: Data visualization for statistics in social science [R package version 2.8.14]. https://CRAN.R-project.org/package=sjPlot

72.

Mendoza

Valencia

Muñoz

Trujillo

(1996). Differences in voice quality between men and women: Use of the long-term average spectrum (LTAS). Journal of Voice, 10, 59–66.

73.

Mendoza-Denton

(2011). The semiotic hitchhiker’s guide to creaky voice: Circulation and gendered hardcore in a Chicana/o gang persona. Journal of Linguistic Anthropology, 21(2), 261–280. https://doi.org/10.1111/j.1548-1395.2011.01110.x

74.

Milroy

(1980). Language and social networks. Blackwell.

75.

Newman

(2011). “Do you sound Asian when you speak English?” Racial identification and voice in Chinese and Korean Americans’ English. American Speech, 86, 152–178.

76.

Ogden

(2001). Turn transition, creak and glottal stop in Finnish talk in interaction. Journal of the International Phonetic Association, 31, 139–152.

77.

Ohala

J. J.

(1994). The frequency code underlies the sound-symbolic use of voice pitch. In Hinton

Nichols

Ohala

J. J.

(Eds.), Sound symbolism (pp. 325–347). Cambridge University Press.

78.

Ohara

(2004). Performing gender through voice pitch: A cross-cultural analysis of Japanese and American English. In Pasero

Braun

(Eds.), Wahrnehmung und Herstellung von Geschlecht (pp. 105–166). VS Verlag für Sozialwissenschaften.

79.

Penney

Cox

(2021, August). Vowel and voice quality differences between mainstream and non-mainstream Australian English speakers [Paper presentation]. Forum on Englishes in Australia, Latrobe University, Melbourne, VIC, Australia.

80.

Penney

Cox

Miles

Palethorpe

(2018). Glottalisation as a cue to coda consonant voicing in Australian English. Journal of Phonetics, 66, 161–184.

81.

Peters

(2003, August 3–9). Multiple cues for phonetic phrase boundaries in German spontaneous speech. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 15), Barcelona, Spain (pp. 1795–1798). International Phonetic Association.

82.

Podesva

R. J.

(2007). Phonation type as a stylistic variable: The use of falsetto in constructing a persona. Journal of Sociolinguistics, 11(4), 478–504.

83.

Podesva

R. J.

(2013). Gender and the social meaning of non-modal phonation types. In Proceedings of the 37th Annual Meeting of the Berkeley Linguistics Society (pp. 427–448). Linguistic Society of America.

84.

Podesva

R. J.

Callier

(2015). Voice quality and identity. Annual Review of Applied Linguistics, 35, 173–194.

85.

R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

86.

Rickford

(1986). The need for new approaches to class analysis in sociolinguistics. Language and Communication, 6(3), 215–221.

87.

Sankoff

Cedergren

H. J.

(1972). Sociolinguistic research on French in Montreal. Language in Society, 1, 173–174.

88.

Selting

Kern

(2011). Ethnic styles of speaking in European metropolitan areas. Studies in language variation. John Benjamins. http://digital.casalini.it/9789027282538

89.

Shue

Y.-L.

Keating

Vicenik

(2011). VoiceSauce: A program for voice analysis. In Proceedings of the International Congress of Phonetic Sciences (pp. 1846–1849). AIP Publishing.

90.

Silverstein

(2003). Indexical order and the dialectics of sociolinguistic life. Language & Communication, 23, 193–229.

91.

Simpson

(2012). The first and second harmonics should not be used to measure breathiness in male and female voices. Journal of Phonetics, 40, 477–490.

92.

Södersten

Lindestad

P.-Å

. (1990). Glottal closure and perceived breathiness during phonation in normally speaking subjects. Journal of Speech, Language, and Hearing Research, 33(3), 601–611.

93.

Sóskuthy

Stuart-Smith

(2020). Voice quality and coda /r/ in Glasgow English in the early 20th century. Language Variation and Change, 32, 133–157.

94.

Stevens

K. N.

(1972). The quantal nature of speech: Evidence from articulatory-acoustic data. In David Jr

E. E.

Denes

P. B.

(Eds.), Human communication: A unified view (pp. 51–66). McGraw-Hill.

95.

Stuart-Smith

(1999a). Glasgow: Accent and voice quality. In Foulkes

Docherty

G. J.

(Eds.), Urban voices: Accent studies in the British Isles (pp. 203–222). Arnold.

96.

Stuart-Smith

(1999b). Voice quality in Glaswegian. Proceedings of the International Congress of Phonetic Sciences, 14, 2553–2556. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS1999/papers/p14_2553.pdf

97.

Sulter

A. M.

Peters

H. F.

(1996). Perceptive characteristics of speech of untrained and trained subjects, and influences of gender. In Sulter

A. M.

(Ed.), Variation of voice quality features and effects of voice training in males and females (pp. 73–94). Groningen University.

98.

Szakay

(2006). Rhythm and pitch as markers of ethnicity in New Zealand English. In Warren

Watson

(Eds.), Proceedings of the 11th Australasian international conference on speech science & technology (pp. 421–426). Australasian Speech Science and Technology Association.

99.

Szakay

(2012). Voice quality as a marker of ethnicity in New Zealand: From acoustics to perception. Journal of Sociolinguistics, 16, 382–397.

100.

Szakay

King

(2018, June 27–30). Voice quality transfer effects between English and Māori [Paper presentation]. 22nd Sociolinguistics Symposium, Auckland, New Zealand.

101.

Szakay

Torgersen

E. N.

(2015). An acoustic analysis of voice quality in London English: The effect of gender, ethnicity and f0 [Conference session]. Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow, Scotland.

102.

Szakay

Torgersen

E. N.

(2019). A re-analysis of f0 in ethnic varieties of London English using REAPER. In Calhoun

Escudero

Tabain

Warren

(Eds.), Proceedings of the 19th international congress of phonetic sciences (pp. 1675–1678). Australasian Speech Science and Technology Association.

103.

Tajfel

Turner

J. C.

(1986). The social identity theory of intergroup behaviour. In Worchel

Austin

W. G.

(Eds.), Psychology of intergroup relations (pp. 7–24). Nelson-Hall.

104.

Teshigawara

(2003). Voices in Japanese animation: A phonetic study of vocal stereotypes of heroes and villains in Japanese culture [Unpublished Doctoral thesis]. University of Victoria, Victoria, BC, Canada.

105.

Thomas

E. R.

(2002). Sociophonetic applications of speech perception experiments. American Speech, 77(2), 115–147.

106.

Trudgill

(1974). The social differentiation of English in Norwich. Cambridge University Press.

107.

Van Borsel

Janssens

De Bod

. (2009). Breathiness as a feminine voice characteristic: A perceptual approach. Journal of Voice, 23(3), 291–294.

108.

Watson

C. I.

Harrington

(1999). Acoustic evidence for dynamic formant trajectories in Australian English vowels. Journal of the Acoustical Society of America, 106, 458468.

109.

Weirich

Jannedy

(2013). /oy/ as an identity marker of Hood German in Berlin. Proceedings of Meetings on Acoustics, 19, 060096.

110.

Weirich

Jannedy

Mendelsohn

J. E.

(2024). Social influences on dynamic formant trajectories in German diphthongs [Unpublished manuscript].

111.

Weirich

Jannedy

Schüppenhauer

(2020). The social meaning of contextualized sibilant alternations in Berlin German. Frontiers in Psychology, 11, Article 2664. https://doi.org/10.3389/fpsyg.2020.566174

112.

White

Penney

Gibson

Szakay

Cox

(2022). Evaluating automatic creaky voice detection methods. Journal of the Acoustical Society of America, 152, 1476–1486.

113.

Wickham

(2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag.

114.

Wiese

(2012). Kiezdeutsch: Ein neuer Dialekt entsteht. Verlag C. H. Beck.

115.

Winter

G. E.

Hübscher

Idemaru

Brown

Prieto

Grawunder

(2021). Rethinking the frequency code: A meta-analytic review of the role of acoustic body size in communicative phenomena. Philosophical Transactions of the Royal Society B, 376, 20200400. https://doi.org/10.1098/rstb.2020.0400

116.

Wolk

Abdelli-Beruh

N.-B.

Slavin

(2012). Habitual use of vocal fry in young adult female speakers. Journal of Voice, 26, e111–e116.

117.

Wright

Mafield

Panfili

(2019). Voice quality types and uses in North American English. French Journal of English Linguistics, 27, 1–14.

118.

Yuasa

I. P.

(2010). Creaky voice: A new feminine voice quality for young urban-oriented upwardly mobile American women. American Speech, 85, 315–337. https://doi.org/10.1215/00031283-2010-018

Increased Breathiness in Adolescent Kiezdeutsch Speakers: A Marker of Multiethnolectal Group Affiliation?

Abstract

Keywords

1 Introduction

1.1 Kiezdeutsch

1.2 Voice quality as a social marker?

1.3 Hypotheses and structure of the paper

2 Methods: production

2.1 Data and speakers

2.2 Acoustic analysis

2.3 Statistical analysis

3 Results: production

3.1 Analysis of F0

3.1.1 Conversation task data

3.1.2. Reading task data

3.2 Analysis of H1*–H2*

3.2.1 Conversation task data

3.2.2 Reading task data

3.3 Analysis of HNR

3.3.1 Conversation task data

3.3.2 Reading task data

3.4 H1*–H2* and HNR in combination

4 Interim discussion

5 Methods: perception

5.1 Listeners

5.2 Stimuli

5.3 Perception test

5.4 Statistical analysis

6 Results: perception

7 General discussion

8 Conclusion

Footnotes

Correction (October 2024):

Funding

ORCID iDs

References

3.2 Analysis of H1–H2

3.4 H1–H2 and HNR in combination