Abstract
Keywords
Introduction
Within the broad field of bilingualism, the gradual, non-pathological change in the native language (L1) due to reduced use and exposure is referred to as L1 language attrition (Schmid, 2011b; Schmid & Köpke, 2017).
Building on previous language attrition research, this study examines the L1 of immigrants who share a language history characterized by late sequential bilingualism, immersion in an L2 environment, and minimal exposure to their native L1 over an extended period (Schmid, 2011b, 2019; Schmid & Jarvis, 2014; Schmid & Köpke, 2017). In this study, we refer to participants who are late sequential bilinguals fully immersed in their second language as “attriters” to differentiate them from other bilingual groups (the term bilingualism is used for the sake of brevity while encompassing multilingualism). We use the more general term “bilinguals” to include individuals with various language backgrounds and sociolinguistic contexts. Nonetheless, we acknowledge the controversy of using the term attriter to refer to a participant group for which evidence of attrition has not been detected (Kasparian & Steinhauer, 2017; Schmid & Köpke, 2017).
Many language attrition studies focus on L1 performance and compare performance between monolingual speakers and L1 attriters (Schmid, 2019; Schmid & Dusseldorp, 2010; Schmid & Yılmaz, 2018). It is widely recognized that languages in the bilingual brain interact continuously in a bidirectional process, altering the cognitive and neural dynamics of both languages and resulting in shifts in individual language dominance patterns over time (Grosjean, 2013; Gurunandan et al., 2023; Laine & Lehtonen, 2018; Linck & Kroll, 2019, Chapter 9, p. 97; Treffers-Daller, 2019). These shifts are influenced by sociolinguistic parameters such as language exposure and use and can, to some extent, be observed in language performance (Treffers-Daller, 2019; Yılmaz, 2019). The language history of language attrition populations typically involves a rapid decline in L1 input coupled with full immersion in L2 upon immigration. Previous studies show that within language attrition populations, some attriters perform at native levels, while others exhibit noticeable changes in L1 performance (Schmid & Yilmaz, 2018). Examining L2 performance alongside L1 can provide insights into individual factors contributing to L1 attrition or retention, as well as L2 acquisition (Bylund & Ramírez-Galan, 2016; Gurunandan et al., 2023; Laine & Lehtonen, 2018; Runnqvist & Costa, 2012; Schmid & Köpke, 2017; Schmid & Yilmaz, 2018).
To investigate the impact of attriters’ shared language history on task performance, many language attrition studies investigate the influence of two variables: length of residence (LoR, i.e., time deprived of native L1 exposure and extensive exposure to L2) and the frequency of L1 use in daily life. It has been shown that LoR predicts task performance when the minimum LoR is less than 10 years, with the effect stabilizing after this period (Opitz, 2011; Schmid, 2011a, 2019). This stabilization has been partially attributed to the initial years of immersion which typically coincide with a period of rapid L2 learning in an immersive environment (Linck & Kroll, 2019, Chapter 9, p. 95). However, while the added prevalence of L2 partly triggers L1 attrition, it is not directly linked to L2 acquisition or level of L2 proficiency (Yılmaz, 2019, Chapter 26, p. 307). Unfortunately, we are not aware of language attrition studies that directly investigate the impact of LoR on L2 performance.
The frequency of L1 use has not been found to systematically impact L1 performance (for an overview, see Schmid, 2019, Chapter 25, pp. 294–295). However, L1 use in different life domains can have different implications for changes in L1 performance. On one hand, frequent use of L1 in a professional setting has been found to have a positive impact on low-frequency word retrieval (Yilmaz & Schmid, 2012), verbal fluency, general proficiency (c-test), grammaticality judgment task, and lexical diversity (Schmid & Dusseldorp, 2010). On the other hand, frequent L1 use in informal interactions with peers (attriters) has been associated with increased variability in the phonemic domain (De Leeuw et al., 2010). Although LoR and frequent L1 use do not systematically predict attrition as separate variables (Schmid, 2019; Schmid & Dusseldorp, 2010; Schmid & Yilmaz, 2018), it has been shown that the impact of LoR can become significant with very little L1 use (de Bot & Clyne, 1994).
Consequently, it has been suggested that attriters who have spent a long time in an L2 environment and use L1 less often exhibit signs of attrition at an individual level. In contrast, speakers who have spent extended time in an L2 environment and who use L1 frequently with peers reflect a contact-induced language change in the community on a larger scale (Schmid, 2011a; Schmid & Köpke, 2017). In the current study, we investigate the impact of LoR and frequency of L1 use on verbal fluency task performance separately, as well as their interactions, to reliably include both extralinguistic variables.
Verbal fluency tasks
VF tasks are widely applied across various populations and languages (Goral, 2004; Olabarrieta-Landa et al., 2017). In a VF task, participants are given a criterion and asked to generate words within a predetermined time frame (typically 60 seconds) (Strauss et al., 2006). The common VF task types are phonemic (PVF) and semantic (SVF).
In a PVF task, participants generate words starting with a specific letter or phoneme, and three trials are typically included for reliable fluency assessment (Oberg & Ramírez, 2006; Strauss et al., 2006; Tombaugh et al., 1999). In studies conducted in English, the letter combination /f/, /a/, and /s/ is frequently used. The letters are initially selected based on their grouping as “easy letters” (Borkowski et al., 1967; Ross, 2003; Strauss et al., 2006), and the frequent use of these letters in the literature allows comparison to other datasets. For a more straightforward letter selection, a language-specific “high-frequency dictionary approach” can be used. In this approach, letters are selected based on their word-initial frequency in the target language, such as /k/, /a/, and /p/ in Finnish (Lehtinen et al., 2021; Mardani et al., 2020; Oberg & Ramírez, 2006; Schmid, 2011a; Tombaugh et al., 1999).
In SVF tasks, participants generate words within a specific semantic category. Unlike PVF tasks, SVF tasks typically focus on one category. Combining data from semantic categories is complex. As unique life experiences influence semantic memory organization, varied demographic and cultural settings result in diverse category sizes and content between populations (e.g., “items in a supermarket” can evoke varied semantic representations in different demographic groups) (Abwender et al., 2001; Olabarrieta-Landa et al., 2017; Roberts & Le Dorze, 1997; Rosselli et al., 2002; Strauss et al., 2006; Troyer, 2000). The category “animals” has been shown to be relatively neutral regarding culture and language (e.g., Pekkala et al., 2009). The category is widely used in linguistic- and clinical studies and in neuropsychological test batteries across languages (for an overview, see Strauss et al., 2006).
PVF and SVF tasks are commonly used to study various aspects of bilingualism, such as lexical access, vocabulary knowledge, dominance pattern of languages, the role of executive functions, and cross-linguistic fluency strategies (Luo et al., 2010; Marsh et al., 2019; Patra et al., 2020; Roberts & Le Dorze, 1997; Rosselli et al., 2002; Schmid & Köpke, 2009). PVF tasks engage strategic cognitive organization, initiation, inhibition, and maintenance of effort as participants conduct a non-routine search for words based on specific lexical representation (i.e., first letter) without the support of the hierarchical organization of semantic memory (Barry et al., 2008; Santos Nogueira et al., 2016; Strauss et al., 2006). In contrast, the SVF task requires a systematic lexical-semantic search—a relatively automatic process that resembles the everyday use of language (e.g., generating a shopping list). SVF task performance mainly relies on semantic categorization, hierarchical mental lexicon, and memory organization (Luo et al., 2010; Patra et al., 2020; Strauss et al., 2006).
This study in the context of existing literature
VF task performance is typically evaluated by the total number of correct responses (Strauss et al., 2006). A growing body of cross-disciplinary research aims to enhance the analytical power of VF tasks through more detailed data analyses (e.g., Becker & de Salles, 2016; Thiele et al., 2016). Based on these studies, Lehtinen et al. (2021) generated guidelines for a systematic approach to VF task administration and analysis, including scoring and calculating total scores, error analysis, investigating temporal parameters (e.g., words generated in 10-, 15-, or 30-second segments within the total time), and exploring clustering and switching strategies. In this study, we analyze total scores, errors, and temporal parameters in PVF and SVF performance in a Finnish-English language attrition population following these guidelines. Next, we will briefly review previous literature on VF task analysis in bilingual contexts before describing the present study in detail.
Total scores
PVF task performance measured by total score has been found to vary from a comparable performance between mono- and bilinguals (Rosselli et al., 2000, 2002; Soltani et al., 2021) to bilinguals performing better than monolinguals (Patra et al., 2020). Similar PVF performance between mono- and bilinguals has been interpreted as less language interference in PVF compared with SVF (Rosselli et al., 2000, 2002). Stronger PVF performance has been linked to superior executive functions related to inhibiting language interference while switching between languages (Ljunberg et al., 2013; Luo et al., 2010; Marsh et al., 2019; Patra et al., 2020; Sandoval et al., 2010). However, PVF tasks are scarcely applied in language attrition studies (Jarvis, 2019, Chapter 21, p. 249; Schmid & Köpke, 2009).
In SVF tasks, bilingual populations consistently generate fewer words than monolinguals. This disadvantage in total scores has been attributed to language interference, weaker connections between lexical representations, or smaller vocabulary (Gollan et al., 2002; Rosselli et al., 2000; Sandoval et al., 2010; Schmid & Köpke, 2017; Soltani et al., 2021; Yılmaz & Schmid, 2018). Studies on language attrition populations line up with these findings, with attriters performing more poorly than monolinguals in SVF, though often with a relatively small effect size (e.g., Badstübner, 2011; Dostert, 2009; Opitz, 2011; Schmid & Dusseldorp, 2010; Schmid & Jarvis, 2014; Schmid & Keijzer, 2009; Schmid & Köpke, 2009). It has been shown that SVF total scores have limited predictive power in profiling individual speakers as attriters or monolinguals, and total scores are largely unrelated to LoR frequency of L1 use (Schmid, 2011a; Schmid & Jarvis, 2014; Schmid & Köpke, 2009). Therefore, a more comprehensive approach to VF data analyses is called for.
Errors
Errors are a part of VF task performance in all populations (Crowe, 1998; Sandoval et al., 2010). In bilingual environments, language intrusions are of particular interest (Gollan et al., 2002) as cross-language intrusions are directly connected to language interference (Gollan et al., 2011). Their presence can also indicate language dominance with more frequent language intrusion errors generated in the non-dominant language (Sandoval et al., 2010). In a language attrition population, Badstübner (2011) found that attriters generated more errors than monolinguals in SVF and described these errors as direct L2 intrusions and incorrect lexical items (such as partial recall of an L1 word). They detected no significant difference in the number of errors between the groups in PVF.
Temporal parameters
Analyzing the temporal parameters of VF tasks involves calculating the number of correct words produced within shorter time segments of the total time (e.g., four 15-second time segments within the 60 seconds). Typically, most words are generated in the early segments of the task across all populations. Approximately half of the words are produced in the first 15 seconds, facilitated by a semi-automatic rapid retrieval process that relies on semantic memory. As time progresses, a more effortful retrieval strategy is employed, engaging strategic executive processes, such as monitoring performance to avoid repetitions. This results in slower word-finding with fewer and more infrequent words (Fernaeus et al., 2008; Lehtinen et al., 2021; Sandoval et al., 2010; Venegas & Mansur, 2011). Thus, group differences in the early segments of the task suggest variations in language knowledge and lexical retrieval, while differences in later segments indicate discrepancies in executive control (Fernaeus & Almkvist, 1998; Fernaeus et al., 2008; Gurunandan et al., 2023; Luo et al., 2010).
Studies have consistently shown that bilinguals generate fewer words than monolinguals during the initial stages of a VF task, but the difference between the groups tends to diminish as the trial progresses (Luo et al., 2010; Sandoval et al., 2010). This pattern is interpreted to reflect more language interference in the early stages of the task, where bilinguals may produce high-frequency words they know in both languages (e.g., “cat”) rather than low-frequency words they may only know in one language (e.g., “bobcat”) (Luo et al., 2010; Sandoval et al., 2010). Interestingly, similar neural activation patterns have been observed during L1 and L2 performance, suggesting a common bilingual effect in both languages (Gurunandan et al., 2023). Regarding language attrition studies, attriters have been shown to retrieve words more slowly than monolinguals in L1 SVF (Jarvis, 2019, Chapter 21, p. 243; Schmid & Jarvis, 2014; Schmid & Köpke, 2009). This finding has been linked to an increased load of inhibiting L2 competitors (Schmid & Jarvis, 2014; Yilmaz & Schmid, 2018) consistent with previous studies on bilingual VF performance (Gollan et al., 2011; Luo et al., 2010; Sandoval et al., 2010). However, highly automatic language skills are considered to be more resistant to language attrition (Goral, 2004; Segalowitz, 1991).
The present study
In the present study, we examine the processes that affect and potentially hinder optimal performance in L1 and L2 PVF and SVF tasks among a group of L1 language attriters who self-report as balanced bilinguals with a slight preference for L2. We conduct a systematic analysis for total scores, errors, and temporal parameters guided by the methodology proposed by Lehtinen et al. (2021). We also investigate how the LoR and frequency of L1 use affect task performance in both L1 and L2. To detect differences between attriters and monolingual speakers, we compare the L1 performance of attriters with a matched L1 monolingual group previously studied by Lehtinen et al. (2021).
Research questions
The research questions and hypotheses for the current study are as follows:
How do language attriters perform in PVF and SVF tasks for their first (L1) and second (L2) languages, and to what extent do LoR and frequency of L1 use affect their performance on the measures listed below?
How is the performance of language attriters in PVF and SVF tasks in their first language (L1) compared with that of a matched monolingual group across the measures listed below?
Measures:
a. Total scores
b. Errors: number, frequency, and distribution of error types
c. Temporal parameters: performance change during the task (measured as the number of words generated in four 15 seconds segments)
d. Ability to generate words rapidly in the early stages of the task (measured as the proportion of words generated within the initial 15-second interval of a task relative to the overall word count)
We will be referring to the different aspects of these research questions by referencing their numbers (RQ1 and RQ2) and letters (a, b, c, d) to specify which aspects of the data analysis is relevant.
Hypotheses
As stated above, while both, PVF and SVF task types engage a wide array of cognitive skills, PVF tasks emphasize executive skills, and SVF tasks rely more on semantic categorization, hierarchical mental lexicon, and memory organization. Based on literature, we expect more variation in the processes required for SVF than PVF in our population, i.e., we expect differences between languages and groups to be more pronounced in SVF than in PVF. For RQ1, we anticipate that attriters will show comparable L1 and L2 performance or stronger performance in their self-reported dominant language (L2) in total scores. We predict shorter LoR and more frequent L1 use will positively affect L1 total scores in PVF and SVF but that these variables will not be strong predictors of performance independently. As participants are healthy, neurotypical adults, we anticipate minimal errors across languages. We hypothesize that attriters will produce more cross-language intrusion errors in their self-reported less dominant language (L1) and that a longer LoR and less frequent L1 use will result in an increased number of errors in L1 but not L2. Regarding temporal parameters, we hypothesize a similar performance profile across languages, and that attriters will more successfully employ rapid retrieval strategies in their self-reported stronger language (L2), particularly in SVF. We also predict that a shorter LoR and more frequent L1 use will facilitate rapid retrieval in L1 but that these variables will not be strong independent predictors of performance.
As for RQ2 on performance between attriters and monolinguals, we predict that attriters will achieve lower total scores in SVF than monolinguals but demonstrate comparable performance in PVF. We expect minimal errors overall, with a lower number of error-free trials in the attriter group. We anticipate that attriters will retrieve words more slowly than monolinguals in the early stages of the task, especially in SVF.
Method
Participants
Data from two healthy participant groups, attriters, and monolingual controls, were analyzed. The University of Turku Ethics committee approved all experimental procedures, and all participants provided informed consent before participating in the study. Exclusion criteria for both groups included (history of) cardiovascular, neurological, psychiatric, developmental language or speech disorders, toxic substance abuse, severe hearing loss, and age over 80 years.
Attriters
The attriter group (
The group’s age was
Participants had lived in an L2 environment for at least 20 years, and the majority had used L2 as their primary professional language during this time (
At the time of the interview, all attriters identified themselves as bilinguals (
When asked about language dominance, attriters reported the following distribution on a 5-point Likert scale (
Monolingual group
The monolingual group (
Matched demographics
Monolingual participants were recruited to correspond to the age, education, and gender distribution of the attriter group to minimize demographic variables’ potential effect. Statistical analyses verified no significant differences between the groups for age (attriters
Data
Data were extracted from a larger dataset consisting of five verbal fluency tasks (L1, L2), a WUG-task (Crystal, 2015) (L1), speech samples elicited via a film retelling task (L1, L2), and free speech samples (L1). In this study, we focused on the participants’ performance in four VF tasks in L1 and L2: three phonemic categories in L1 (Finnish, /k/, /a/, /p/) and L2 (English, /f/, /a/, /s/) and one concrete semantic category, “animals”.
Background information was collected via the Language Attrition Test Battery Sociolinguistic Questionnaire (SQ), introduced and discussed in detail in Schmid (2011b), Schmid and Cherciov (2019, Chapter 23, pp. 267–276), and Schmid and Dusseldorp (2010), and available in multiple languages on www.languageattrition.org. The first author translated the SQ into Finnish (Supplemental Appendix A). An abridged questionnaire for the control group was used to verify language use in daily life (Supplemental Appendix B). Participants were given the option to fill out the questionnaire before or during the session.
All tasks were completed in a quiet environment (such as participants’ home or clinical setting) in one sitting in Northern CA, USA for attriters, and Finland for the control group. Tasks were completed in L1 and L2 within a single session. All tasks were completed in one language and after a short break in the other. The order of languages was randomized, and participants were aware that the interlocutor was bilingual. Tasks were presented in randomized order within each language, with VF tasks presented in the following order: concrete category “animals,” (abstract category “emotions”), phonemes “/k/, /a/, /p/” for L1 and “/f/, /a/, /s/” for L2.
This study focuses on three phonemic and one semantic VF tasks. For the phonemic task in L1, we followed the high-frequency dictionary approach (Lehtinen et al., 2021; Mardani et al., 2020; Oberg & Ramírez, 2006; Schmid, 2011a) selecting the two most frequent word-initial consonants of Finnish /k/ (15242 words) and /p/ (10640 words) and the most frequent word-initial vowel /a/ (4361 words) (Kielitoimiston sanakirja, 2021; Leskinen, 1989). For L2 (English), we selected phonemes /f/ (6939 words), /a/ (10360 words), and /s/ (19236 words) (Merriam-Webster, n.d.) based on their frequent use in literature (e.g., Strauss et al., 2006). For the SVF, task we included one semantic category, the culturally and linguistically relatively neutral, concrete category, “animals” (e.g., Pekkala et al., 2009), to limit the complexity of semantic analysis. The administration of the five VF tasks typically lasted for 5–10 minutes; the whole session was completed within 2 hours.
As extralinguistic variables, we examined the LoR in the L2 dominant environment in years (
Procedure
All VF tasks were administered and scored following the procedure outlined in Lehtinen et al. (2021). Briefly, participants were asked to produce as many different words as possible in 60 seconds following the given criteria, with the only restriction being for proper names. Responses were transcribed, and acceptable words were calculated for total scores, with semantically distinctive words calculated as separate items. Errors were excluded from the total score and classified into four categories: repetitions, categorical errors, nonwords, and language intrusions. Utterances in Finglish, a macaronic mixture of Finnish and English (e.g.,
We utilized R software (R Core Team, 2019) for statistical modeling and data visualization with packages dplyr (Wickham et al., 2019), lme4 (Bates et al., 2015), sjPlot (Lüdecke, 2018), ggplot2 (Wickham, 2016), arsenal, and ggeffects (Lüdecke, 2018). The analysis script is available at https://osf.io/fue3k/?view_only=6b6762f07e2243d6b8548c0992dce9f1. Continuous predictors used as predictors were scaled and centered to sample means in all models.
To address RQ1.a. on overall performance in the attriters group, we used a linear mixed-effects model to examine total scores as a function of task language (L1/L2), task type (PVF/SVF), LoR, and frequency of L1 use and their interactions. We included participant intercept and slope for task language as a random factor to account for individual differences in proficiency between Finnish and English. We selected predictors for the final model using a model comparison with Bayesian information criterion (BIC) values, choosing the simplest model when the two models did not differ significantly in BIC. To address RQ1b on the number of errors between L1 and L2, we used the Wilcoxon Rank-Sum Test. Due to the limited number of errors in the dataset, we excluded the impact of extralinguistic variables on errors from the analysis. To answer RQ1c on temporal patterns, we fitted a model to examine the attriters’ performance changes over time during the 60 seconds. The model considered the number of correct words generated during 15-second time intervals and included task language, time sequence (0–15, 16–30, 31–45, or 46–60), task type, LoR, and L1 use as predictors. Participant intercept with a slope for task type was used as a random factor to account for individual-level differences. To analyze the ability to generate words rapidly in the early stages of the task (RQ1d), we modeled the ratio of correct words generated in the first segment to the total score. We used task language (L1/L2), LoR, and L1 use as predictors, and included participant intercept with a slope for task language as a random factor.
For RQ2a on the overall performance between attriter and monolingual groups, we used a linear mixed-effects model to model total scores as a function of task type (PVF/SVF) and participant group. Participant intercept with a slope for task type was included as a random factor to account for individual-level variation. The number of errors between groups in L1 (RQ2b) was analyzed using the Wilcoxon Rank-Sum Test. For group comparisons between temporal performance profiles between attriters and monolinguals (RQ2c), we fitted a model to investigate performance changes during the task, with the number of correct words generated during 15-second intervals as the outcome variable, and participant group, time sequence, and task type as predictors. We used the same model selection procedure as previously described. To compare the ability to generate words rapidly in the early stages of the task between attriters and monolinguals (RQ2d), we modeled the ratio of correct words generated in the first segment to the total score as the outcome variable, with task type and participant group as predictors, and included participant intercept with a slope for task type as a random factor. The model selection followed the procedure described earlier.
Results
For brevity, we have presented descriptive statistics and complete model summaries in Supplemental Appendix C, with statistics for key findings summarized in the following.
Research question 1
Our first research question was: How do language attriters perform in PVF and SVF tasks for their first (L1) and second (L2) languages, and to what extent do LoR and frequency of L1 use affect performance?
Total scores
To analyze total scores in VF tasks, the model selection procedure based on BIC values (L1 use, LoR, or both, combined with task language and fluency type) suggested the model with L1 use but no LoR as the best fit for the data (Supplemental Appendix C, Table 3 and Model Summary 1). Our analysis showed statistically significant main effects of task language, fluency type, and L1 use, indicating that more words were generated in L1 than in L2 and in semantic compared with phonemic tasks. Frequent L1 use was associated with better overall performance in the tasks across fluency types and task languages (1.46, 95% CI = [0.36, 2.55],

Predicted values of number of correct words in the attriter group in one semantic and three phonemic tasks (Combined) in L1 and L2 with frequency of L1 use as a predictor.
The number of errors, frequency, and distribution of error types
Table 1 presents the descriptive statistics for errors in L1 and L2 for the attriter group. Overall, the number of errors was minimal, which led us to exclude the impact of extralinguistic variables on errors from our analysis. As expected, the most common error type was repetition. There was no significant difference in the number of errors between L1 and L2 in PVF (
Number of attriters who generate errors, total number of errors, and distribution of error types in one semantic and three phonemic verbal fluency tasks in L1 and L2.
Performance change during the task
To examine the temporal profile and its association with task attributes (SVF / PVF and L1 / L2), we employed a model to predict the total number of acceptable words produced during four 15-second time windows, with L1 use and LoR as predictors. Using the model comparison procedure described earlier, we determined that a model with L1 use but without LoR was the most parsimonious (Supplemental Appendix C, Table 3 and Model Summary 2). The selected model showed that more words were generated at the beginning of the task than in the later segments in both L1 and L2, with no significant difference between languages, as illustrated in Figure 2. As in RQ1a (Model Summary 1) a main effect of L1 use was detected. However, there were no significant interactions between L1 use and temporal parameters, suggesting that the observed pattern was similar across languages despite the L1 use frequency.

Predicted values of correct words in the four 15-second time segments in one semantic and three phonemic verbal fluency tasks in the attriter group in L1 and L2. Error bars represent 95% confidence intervals.
Ability to generate words rapidly in the early stages of the task
To further examine performance variation in the attriters group, we examined the quotient of acceptable words produced during the first time segment in all tasks as a function of task language and L1 use (Supplemental Appendix C, Model Summary 3). Our analysis revealed that attriters generated a smaller quotient in L1 (42%) than in L2 (44%) during the first time segment, although the difference between languages was not statistically significant. Moreover, our model showed that attriters who reported using L1 more frequently in their everyday life produced a smaller quotient in both languages than those who reported using L1 less frequently (−0.02, 95% CI = [−0.04, −0.01],

Predicted values of the quotient of correct words in the first 15-second time segment in the attriters group in L1 and L2 (Combined) in all task types (combined) with frequency of L1 use as predictor.
Research question 2
Our second research question was: How is the performance of language attriters in PVF and SVF tasks in their first language (L1) compared with that of a matched monolingual group?
Total scores
For group comparison for overall performance measured in total scores, the results indicated that monolinguals outperformed attriters in the L1 SVF task (3.65, 95% CI = [1.40, 5.91],

Comparison of predicted values of number of correct words in one semantic and three phonemic verbal fluency tasks (Combined) in L1 between the attriter and monolingual groups using participant group as a predictor.
The number of errors, frequency, and distribution of error types
For the monolingual group, results published previously in Lehtinen et al. (2021) are referenced here. In the PVF task, approximately half of the monolingual participants generated errors in all three trials (/k/ 44% [
Performance change during the task
To investigate the temporal patterns of task performance and their relationship to task type in attriter and monolingual groups, we modeled the total number of acceptable words as a function of the number of acceptable words generated during four 15-second time segments (Supplemental Appendix C, Tables 3 and 4, Model Summary 5). Our analysis showed no significant differences between the two groups in the distribution of words across the four time segments, as demonstrated in Figure 5.

Predicted values of words produced in the four 15-second time segments in one semantic and three phonemic verbal fluency tasks in the attriter and monolingual groups. Error bars represent 95% confidence intervals.
Ability to generate words rapidly in the early stages of the task
We also investigated the performance of attriters and monolinguals during the initial 15-second of all tasks by modeling the quotient of correct words produced as a function of the group (Supplemental Appendix C, Model Summary 6). The analysis revealed that the quotient was higher in attriters (43%) than in monolinguals (41%) (−0.02, 95% CI = [−0.04, 0.00],
Discussion
We set out to explore processes that underlie performance in verbal fluency tasks in Finnish-English mature immersed bilinguals (i.e., language attriters), who self-identify as balanced bilinguals with a slight preference for L2. We analyzed PVF and SVF data in both their first (L1) and second (L2) languages, focusing on total scores, errors, and temporal parameters, and assessed the impact of immersion duration (LoR), frequency of L1 use, and their interactions on task performance. In addition, we contrasted the attriters’ performance in L1 verbal fluency tasks to that of monolinguals to identify potential language attrition markers.
Our key findings show that attriters generated more acceptable words in their first language than in their second language across fluency types and that frequent L1 use supported performance in both languages (RQ1a, Model Summary 1, Figure 1). Attriters made very few errors in either language, with the most typical error type being repetitions (RQ1b). The temporal performance profile was similar for both languages (RQ1c, Model Summary 2, Figure 2), and those attriters who used L1 more frequently generated a smaller quotient of words in the initial stage of VF tasks in both languages than those who used L1 less frequently (RQ1d, Model Summary 3, Figure 3).
Compared with monolingual L1 performance, attriters generated fewer acceptable words in the L1 semantic task than monolinguals, and groups performed similarly in the phonemic task (RQ2a, Model Summary 4, Figure 4). Attriters made more errors than monolinguals in the phonological task, but the number of errors was comparable in the semantic task between groups. Qualitatively fewer attriters than monolinguals generated error-free trials in both tasks, with the most typical error in both groups being repetitions (RQ2b). The temporal performance profile in L1 did not differ between attriters and monolinguals (RQ2c, Model Summary 5, Figure 5). However, attriters generated more correct words in the initial stage of the task compared with monolinguals (RQ2d, Model Summary 6).
In the following, we discuss our findings in relation to earlier literature. We focus on performance between L1 and L2 in the attriter group before moving on to group comparison between the attriter and monolingual groups in L1. Finally, we outline suggestions for future research and address some limitations of this study.
RQ1. Performance between L1 and L2 within the attriter group
Attriters demonstrated strong L1 proficiency by generating higher total scores in L1 than in L2 across tasks. This points in the direction of L1 as the more dominant language, contrary to the attriters’ self-reported language dominance (Roberts & Le Dorze, 1997; Rosselli et al., 2002). The small number of errors across tasks did not allow robust statistical analysis of errors, but some observations can be made. While the number of errors was similar across languages, qualitatively, attriters generated more language intrusions in L1 than in L2, suggesting transference from L2 to L1 but not vice versa. As attriters demonstrated stronger overall performance in L1, this contradicts our hypothesis of increased intrusion errors in the non-dominant language (Sandoval et al., 2010). Overall, by generating more correct words and allowing more language intrusions in L1 than L2, attriters demonstrate robust language proficiency but greater flexibility in L1 performance compared with L2, potentially as a marker of cross-linguistic influence of L2 (Schmid & Cherciov, 2019, Chapter 23, p. 267; Schmid & Dusseldorp, 2010; Sharwood Smith, 2019, Chapter 8, p. 85).
For temporal performance profiles, we anticipated similar performance across languages or faster initiation in the more dominant language among the attriters group (Fernaeus & Almkvist, 1998; Fernaeus et al., 2008; Gollan et al., 2011; Luo et al., 2010; Sandoval et al., 2010; Schmid & Jarvis, 2014; Yilmaz & Schmid, 2018). We found comparable temporal word distributions across languages and equally efficient lexical retrieval in the initial stage of both languages, indicating balanced language dominance.
Contrary to our hypothesis, there was no interaction effect for LoR and frequency of L1 use, but frequent L1 use supported overall performance and proportionally slowed down rapid retrieval in both languages. Previous research (e.g., Gurunandan et al., 2023) has shown that similar neural activation patterns occur during L1 and L2 VF performance, suggesting a common bilingual effect for both languages. Our results of similar effect of frequency of L1 use for both languages rather than a specific language may reflect the influence bilingualism has on overall cognitive and neural language processing. Thus, our findings highlight the importance of including both languages, L1 as well as L2 in language attrition studies to account for the general effect of bilingualism.
In addition to the impact of frequency of L1 use, we also aimed to investigate to what extent LoR affects VF performance in L1 and L2. However, our model selection procedure suggested a model without LoR to be the most parsimonious fit to our data and we were not able to directly investigate the effect of LoR on task performance. In a similar vein to the similar impact of frequency of L1 use for both languages, this omission might suggest that LoR was not a meaningful predictor for VF task performance, in either the first or the second language.
RQ2. Performance between the attriter and the monolingual group
Our findings between attriters and monolinguals mirror literature. Monolinguals outperformed attriters in SVF, and groups performed similarly in PVF (e.g., Badstübner, 2011; Dostert, 2009; Opitz, 2011; Schmid, 2011b, 2019; Schmid & Dusseldorp, 2010; Schmid & Jarvis, 2014; Schmid & Keijzer, 2009; Schmid & Köpke, 2009). As expected, there were fewer error-free trials in the attriter group, especially in trials /k/ and /p/, with emphasis on repetitions and nonword errors. Numerically attriters generated more errors in the PVF than monolinguals. Thus, attriters experienced some difficulty in rapid lexical retrievals compared with monolinguals, potentially due to language interference or contact-induced language change that manifested as nonword errors.
Contrary to our hypothesis, performance in the attriter group was not slowed down in the early stages of the task compared with monolinguals. Thus, attriters did not demonstrate markers of language interference in temporal analysis that would explain the differences in overall SVF task performance, as suggested by earlier studies (Rosselli et al., 2000, 2002; Sandoval et al., 2010; Schmid & Jarvis, 2014). Moreover, attriters generated a higher percentage of total words in the first 15-second segment than monolinguals. As word-finding is facilitated by semi-automatic rapid retrieval strategies in the early segments of the task (Fernaeus et al., 2008; Lehtinen et al., 2021; Sandoval et al., 2010; Venegas & Mansur, 2011), our results suggest that attriters relied more on semi-automatic rapid retrieval strategies in L1 than monolinguals, implying not only that these strategies are resistant to language attrition (Gollan et al., 2002; Segalowitz, 1991), but also that in the first 15-second time segment attriters may utilize rapid retrieval more efficiently than monolinguals. Interestingly, this is highlighted by the finding that while frequent L1 use slowed attriters down in the first 15-second segment, they still performed better than monolinguals in the first 15-second segment, with the overall performance resulting in lower total scores in SVF. Investigating the processes underlying these observations is beyond the scope of this study. In the following, we discuss suggestions for future studies and address the limitations of the present study.
Limitations of the present study, and suggestions for future research
As our sample was small, interpretations of the data are preliminary explorations. Future studies with larger datasets are needed to investigate the trends found in our data reliably. This is particularly relevant for error analysis. Due to the small amount of data, we were not able to conduct a robust statistical analysis or investigate the distribution profile of errors within tasks. In future studies, examining the distribution of errors over time may provide valuable insight into the performance differences in the initial stage between attriters and monolinguals.
Regarding error analysis, interpreting fine-grained data on error types from studies using different categorization systems can be challenging. For instance, our findings contrast with Badstübners’ (2011) findings, which found a significant difference between monolinguals and attriters only in a SVF task. However, our analysis cannot be directly compared with Badstübners’. They examined combined data from three semantic categories and found that most incorrect lexical items in SVF resulted from L2 transfer in the “things in the kitchen” category (i.e., words related to everyday life). In addition to complexities in combining data from different semantic categories, in environments where L1 is used infrequently (or not at all), words related to everyday life are highly activated in L2. Consequently, Badstübners’ (2011) category related to everyday life objects might have prompted more language interference from L2 than the less frequently used category of “animals” in our study.
In addition, we categorized non-standard utterances as nonword errors. These included terms that could be classified as Finglish (a macaronic blend of Finnish and English, as described in Virtaranta, 1992), which might also be considered L2 language intrusions, indicators of substantial language interference, or signs of language evolution within the attriters group in L1. As such, a distinct category for this type of error could have been informative, especially in future studies conducted with larger datasets. We conducted our analysis following Lehtinen et al. (2021) and remain optimistic that this qualitative data analysis can serve as a foundation for future research.
When interpreting our results, it is important to consider the potential impact of shared cognates on language interference. The languages in this study, L1 (Finnish) and L2 (English) have very few cognates. As Schmid and Jarvis (2014) demonstrate, there may be less competition between high-frequency words in Finnish and English compared with language pairs with a larger shared vocabulary. Thus, it is possible that in our dataset, the effect of language interference on rapid lexical retrieval was more subtle, and our analysis only detected it as a within-group variation effect linked to the frequency of L1 use and not at the group level.
Our analysis consisted of two groups: attriters and monolinguals. A comprehensive analysis of the monolingual group VF task performance was previously reported by Lehtinen et al. (2021). They found no effect for age but detected a positive effect of education and gender on VF task total scores. For the present study, the monolingual and attriter groups were matched for age, gender, and education. Although we are confident that closely matched groups effectively control for age, gender, and education in group comparisons, it is possible that these variables influenced the performance in the attriter group in ways not accounted for in our analyses. Controlling for education and gender within the attriter group, especially for the impact of L1 use on rapid retrieval, could have strengthened our data interpretation. We recommend controlling for age and education in future studies to ensure a more comprehensive interpretation of the results.
Limitations of this study include measuring L1 use only as an overall measure of the frequency of use in everyday life. Based on earlier research (e.g., Schmid & Dusseldorp, 2010), our goal was to explore the impact of L1 use for professional purposes on VF performance. Unfortunately, our dataset did not allow for such an analysis. A more detailed approach to L1 use might help explain variation in task performance within the attriter group. However, assessing the influence of extralinguistic variables on observed phenomena is complex. Language history reports on L1 use typically involve subjective self-reports spanning several years and varying external circumstances that dictate the use of L1 in everyday life. These reports should be considered as the best available approximations, but their comparability between individuals is not straightforward (Bylund & Ramírez-Galan, 2016; Köpke & Schmid, 2004). Furthermore, LoR and L1 use are interconnected in many ways. While the time elapsed from leaving the L1 country is measurable, assessing the level of deprivation of L1 during that time is more challenging, especially in datasets dating to modern times (Schmid, 2019). Despite the limitations, exploring the impact of shared language history in language attrition populations is valuable beyond research interests. Gaining insight into how these factors affect L1 performance can motivate and support individuals in preserving their native language and cultural identity.
Conclusions
The present study showed that attriters, who self-report as balanced bilinguals with a slight preference for L2, demonstrate strong proficiency in L1 with similar lexical retrieval strategies in L1 and L2. Our analysis suggests balanced bilingualism with a subtle emphasis for L1 at the group level after 20 years of immersive exposure to L2, partly contrary to participant self-reports. We showed that frequent L1 use supports overall VF task performance but proportionally slows performance down in the initial stage of the task in both languages without significantly impacting overall performance. Compared with monolinguals, attriters demonstrate an overall disadvantage in SVF, but this disadvantage is not due to a slower initiation profile, as hypothesized. In contrast, attriters rely on rapid retrieval in L1 more than monolinguals. These findings add to our understanding of how attriters and monolinguals approach verbal fluency tasks and highlight the potential importance of early task performance in VF task analysis.
Our findings support the notion of two-way interaction in cognitive language processing, acquisition, attrition and dominance shifts in a bilingual environment. In the future, analyzing processes underlying VF performance in-depth in L1 and L2 using techniques like clustering and switching analysis (Lehtinen et al., 2021; Troyer, 2000; Troyer et al., 1997) could increase our understanding of lexical retrieval strategies in language attrition populations and the role of frequent L1 use in bilingual language processing.
Supplemental Material
sj-pdf-1-ijb-10.1177_13670069231193727 – Supplemental material for Frequent native language use supports phonemic and semantic verbal fluency in L1 and L2: An extended analysis of verbal fluency task performance in an L1 language attrition population
Supplemental material, sj-pdf-1-ijb-10.1177_13670069231193727 for Frequent native language use supports phonemic and semantic verbal fluency in L1 and L2: An extended analysis of verbal fluency task performance in an L1 language attrition population by Nana Lehtinen, Anna Kautto and Kati Renvall in International Journal of Bilingualism
Supplemental Material
sj-pdf-2-ijb-10.1177_13670069231193727 – Supplemental material for Frequent native language use supports phonemic and semantic verbal fluency in L1 and L2: An extended analysis of verbal fluency task performance in an L1 language attrition population
Supplemental material, sj-pdf-2-ijb-10.1177_13670069231193727 for Frequent native language use supports phonemic and semantic verbal fluency in L1 and L2: An extended analysis of verbal fluency task performance in an L1 language attrition population by Nana Lehtinen, Anna Kautto and Kati Renvall in International Journal of Bilingualism
Supplemental Material
sj-pdf-3-ijb-10.1177_13670069231193727 – Supplemental material for Frequent native language use supports phonemic and semantic verbal fluency in L1 and L2: An extended analysis of verbal fluency task performance in an L1 language attrition population
Supplemental material, sj-pdf-3-ijb-10.1177_13670069231193727 for Frequent native language use supports phonemic and semantic verbal fluency in L1 and L2: An extended analysis of verbal fluency task performance in an L1 language attrition population by Nana Lehtinen, Anna Kautto and Kati Renvall in International Journal of Bilingualism
Footnotes
Declaration of conflicting interests
Funding
Supplemental material
Author biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
