Abstract
Introduction
In the context of second language (L2) assessment, both human- and machine-scored high-stakes speaking tests, such as the Test of English as a Foreign Language (TOEFL) and the International English Language Testing System (IELTS), assess fluency as a key construct in speaking tests. This is in line with research findings that demonstrate that L2 fluency is a robust indicator of L2 oral proficiency (Tavakoli et al., 2020). In addition, recent advances in acoustic analyses and natural language processing have contributed significantly to increasing the potential for automated scoring of L2 speaking performance (De Jong, 2018; Ginther et al., 2010). In this line of research, objective measures of fluency, such as speech rate and mean number of silent pauses, have been found to play a vital role in the estimation of test-takers’ speaking scores. It is thus essential to understand and continuously monitor the validity of measurements of oral fluency.
L2 researchers have attempted to identify a set of valid objective measures of fluency in light of the underlying linguistic knowledge and competence (De Jong et al., 2013; Kahng, 2020). Following Segalowitz’s (2010) framework, the former is commonly termed
Although prior research has shown that the L1-L2 UF link is cross-linguistically robust (Bradlow et al., 2017), previous studies have been largely limited to closed tasks (see Pallotti, 2009), such as picture and personal narrative tasks, as well as to the L1-L2 combination of stressed-timed or syllable-timed languages (e.g., Dutch, English, French, Spanish). In addition, albeit only with a few studies available to date, previous studies have suggested that the strength of the association between L1 and L2 fluency can be moderated by L2 proficiency (Derwing et al., 2009). Therefore, to further explore the generalisability of previous research findings, the current study examines the L1-L2 UF link in the context of open-ended speech produced by L1 Japanese (mora-timed language) learners of English (stress-timed language), as well as the moderator effects of L2 proficiency on the L1-L2 UF link.
Background
L1-L2 utterance fluency link
From the perspective of speech production mechanisms, both L1 and L2 speech production commonly proceed in the following order: conceptualisation, formulation, and articulation (Kormos, 2006; Levelt, 1989). Conceptualisation involves planning the speech content and organising it for subsequent linguistic processing (i.e., formulation) and articulation. Among these speech production processes, language-general processing is primarily limited to conceptualisation, while formulation and articulation are considered to be language-specific (Kormos, 2006; Suzuki & Révész, 2023). Accordingly, it can be argued that the processes and resources for conceptualisation are shared between L1 and L2 speech production. Furthermore, even L1-specific linguistic knowledge can be transferred to L2 speech production when the mechanisms for L2-specific processing (e.g., syntactic encoding, phonological encoding) are not yet fully developed (Kormos, 2006). Therefore, the L1-L2 UF link can reflect the speaker’s efficiency in language-general processes and the transfer of L1 knowledge to L2 speech production. The former could presumably be independent of the speaker’s L1 and L2 background and L2 proficiency, while the latter might be contingent upon their L2 proficiency.
The L1-L2 UF association has been found to be cross-linguistically robust in speaking rate measures
1
(Bradlow et al., 2017). Nonetheless, as summarised in Table 1, the effect sizes in previous studies have not always been consistent, possibly because of the three distinctive dimensions of UF—speed (speed of delivery), breakdown (pausing behaviours), and repair aspects (disfluency features) of fluency (Suzuki & Kormos, 2023; Tavakoli & Skehan, 2005). First, the strength of association between L1 and L2 speed fluency measures has been found to be either moderate (De Jong et al., 2015; Huensch & Tracy-Ventura, 2017) or strong (De Jong & Mora, 2019; Peltonen, 2018; Peltonen & Lintunen, 2022). Although most studies employed picture narrative tasks to elicit learners’ speech and commonly reported moderate-to-strong correlations, a strong correlation was also found in role-play tasks (
Summary of 10 key studies on L1-L2 utterance fluency link.
Regarding breakdown fluency, both pause frequency and pause duration measures tend to show moderate to strong correlations between L1 and L2 speech (De Jong et al., 2015; Peltonen, 2018). However, the effects of pause type (silent vs. filled pauses) on the predictive power of L1 measures in the corresponding L2 measures have varied across studies. The correlation coefficients between L1 and L2 silent pause frequency tend to be moderate to strong, whereas the strengths of the L1-L2 association of filled pause frequency vary considerably across studies. Given that previous studies commonly involved participants with a combination of stress-timed and syllable-timed L1 background and target L2, the varied strength of association between L1 and L2 filled pause frequency may not be attributed to cross-linguistic differences (e.g., rhythm, tempo). An alternative possible reason for the variation in the strength of association between L1-L2 filled pause frequency is cultural differences, because even phonologically similar languages can have different norms for temporal features (Tian et al., 2017). Similarly, the correlation coefficients between L1 and L2 pause duration measures tend to be moderate to strong (De Jong et al., 2015; De Jong & Mora, 2019; Huensch & Tracy-Ventura, 2017; Peltonen, 2018). As regards pause location, L1 and L2 end-clause pause duration are more strongly associated than are L1 and L2 mid-clause pause duration (De Jong et al., 2015; Huensch & Tracy-Ventura, 2017; Peltonen, 2018). End-clause pauses are reflective of conceptualisation processes, which are shared across different language systems within individuals (De Jong, 2016). Accordingly, the covariance of end-clause pause duration between L1 and L2 may derive from the language-general processes underlying end-clause pauses.
The effect sizes of the L1-L2 UF link in repair fluency measures can vary considerably. One plausible reason for such varying effect sizes is the incompatibility of the focus of repair fluency measures across studies. Some studies count different kinds of disfluency phenomena altogether for the sake of the validity of classification (Duran-Karaoz & Tavakoli, 2020; see also Suzuki et al., 2021), while other studies have targeted certain disfluency features and have employed fine-grained measures, such as self-repetition frequency and self-correction frequency (De Jong et al., 2015; Huensch & Tracy-Ventura, 2017). However, even for the same target repair features, different effect sizes for the L1-L2 association have been observed. For instance, in the case of self-repetition frequency, a strong effect size was reported in De Jong et al.’s (2015) study, while a non-significant correlation was found in Peltonen’s (2018) research. These inconsistent findings may indicate the association between L1 and L2 UF measures might be affected by moderator variables. Two of these important moderators between the L1-L2 UF link might be cross-linguistic differences between L1 and L2, and learners’ L2 proficiency (Huensch & Tracy-Ventura, 2017).
Cross-linguistic variability
Cross-linguistic similarities and differences are assumed to moderate the strength of the L1-L2 UF association. Bradlow et al.’s (2017) cross-linguistic study confirmed that in speaking rate measures (speech rate and articulation rate), the predictive power of L1 UF on L2 UF was robust across different L1 backgrounds, while the strengths of the L1-L2 association tended to vary according to speakers’ L1. There are theoretically relevant linguistic features to temporal aspects of speech: rhythmic patterns (e.g., stress-timed vs. syllable-timed language) and syllable complexity (Pellegrino et al., 2011). Based on previous studies examining the L1-L2 UF link, one can argue that while the most researched target L2 is English, participants’ L1s were mainly either stress-timed languages, such as Slavic languages (Derwing et al., 2009), or syllable-timed languages, such as Finnish (Peltonen, 2018), Mandarin (Derwing et al., 2009), Spanish (De Jong & Mora, 2019), and Turkish (Duran-Karaoz & Tavakoli, 2020). However, to the best of our knowledge, no studies have examined the L1-L2 UF link in the context of learners of L2 English with mora-timed L1 language backgrounds. One representative mora-timed language is Japanese (Vance, 2008). According to Pellegrino et al. (2011), the Japanese language has relatively low syllable complexity (indexed by the mean number of constituents per syllable; 2.65 in Japanese vs. 3.70 in English and 3.87 in Mandarin) and information density (indexed by the average semantic information per syllable; 0.49 in Japanese vs. 0.91 in English and 0.94 in Mandarin). Therefore, to better understand the cross-linguistic effects on the L1-L2 UF link, the association between L1 and L2 UF behaviours should be examined with Japanese-speaking learners of English.
Moderating role of L2 proficiency
In examining the L1-L2 UF link, previous studies have operationalised L2 proficiency in different ways, such as through examining longitudinal changes (Derwing et al., 2009; Huensch & Tracy-Ventura, 2017), using standardised test scores (Duran-Karaoz & Tavakoli, 2020; Gao & Sun, 2024) and through obtaining vocabulary size estimates (De Jong et al., 2015; De Jong & Mora, 2019; Peltonen, 2018). Among these different operationalisations, the potential moderator effects of L2 proficiency on the L1-L2 UF link can be observed based on the longitudinal changes in correlation coefficients between L1 and L2 fluency measures. Derwing et al.’s (2009) study tracked the relationship between L1 and L2 fluency longitudinally in a cross-linguistically different L1-L2 group (Mandarin learners of English) and a cross-linguistically similar L1-L2 group (Slavic learners of English) in Canada where English is both an official and the dominant language. They found that L1-L2 correlations of speech rate and silent pause frequency decreased as a function of time in the former group, while in the latter group, the correlation coefficients were relatively stable across three time points with moderate-to-strong effect sizes. Similarly, Huensch and Tracy-Ventura’s (2017) longitudinal study included two groups of English-speaking students who had learned either French or Spanish as L2 and had resided in an L2-speaking environment. Their study found that in both groups, L1-L2 correlations in some fluency measures, such as articulation rate and silent pause duration (especially within the analysis of speech units; cf. Foster et al., 2000), increased after 5 months’ residence in the L2-speaking environment. However, for English-speaking learners of French, the non-significant L1-L2 correlation of self-repetition frequency before residence abroad (
The aforementioned studies claimed that changes in the L1-L2 relationship can be derived from gains in L2 proficiency. However, longitudinal residence in the L2-speaking environment may not entail only a gain in underlying linguistic competence, but also changes in affective status (e.g., willingness to communicate; Lee, 2018), which can also contribute to fluency in L2 speech production (Segalowitz, 2010). Meanwhile, Gao and Sun’s (2024) study directly examined the moderator effects of L2 proficiency on the L1-L2 fluency link specifically in breakdown fluency measures. Notably, their participants are school-age Chinese students rather than adult learners as in the abovementioned studies. They adopted the picture narrative task of the sample test of TOEFL Junior speaking test, and the speech samples were assessed by their trained raters based on the scoring criteria of the TOEFL Speaking test. Their results showed that there was no significant interaction effect between TOEFL Junior speaking test scores, as the measure of L2 proficiency, and L1 breakdown fluency measures. Although they concluded the L1-L2 UF link is independent of L2 proficiency, their L2 proficiency scores and L2 fluency measures were obtained from the same speech samples of the TOEFL Junior speaking test. The lack of independence of the observations of these two variables might have obscured the moderator effects of L2 proficiency on L1-L2 UF link.
From a methodological perspective, to better understand the mechanisms of the L1-L2 fluency link, a fine-grained measurement of L2 proficiency is needed that can specifically tap into L2-specific competence or fluency-related processing skills (Huensch & Tracy-Ventura, 2017; Pérez Castillejo & Urzua-Parra, 2023). In the domain of L2 fluency research, such competence is conceptualised as L2-specific cognitive fluency, which refers to the efficiency of the speaker’s L2-specific cognitive and linguistic processes that underlie L2 utterance fluency (Segalowitz, 2016). More specifically, cognitive fluency is assumed to have two interrelated dimensions: linguistic resource (i.e., breadth and depth of linguistic knowledge) and processing speed (i.e., automaticity in accessing and manipulating linguistic knowledge; Suzuki & Kormos, 2023). Previous studies have also demonstrated strong predictive power of cognitive fluency to UF performance (De Jong et al., 2013; Kahng, 2020; Suzuki & Kormos, 2023). Therefore, the current study operationalised L2 proficiency as L2-specific cognitive fluency.
The current study
To explore the generalisability of the L1-L2 UF link, the current study examines the L1-L2 UF link in the context of argumentative speech produced by L1 Japanese (mora-timed language) learners of English (stress-timed language). In addition, a synthesis of previous studies suggests that the strength of the association between L1 and L2 fluency can vary as a function of L2 proficiency. To examine the moderator effects of L2 proficiency on the L1-L2 UF link, our study used factor scores based on the latent variables of CF as a proxy for L2 proficiency from the results of our precursor study (Suzuki & Kormos, 2023). The following research questions (RQs) were formulated:
RQ1. To what extent are L2 English utterance fluency measures predicted by the corresponding L1 Japanese measures in an argumentative task?
RQ2. To what extent is the relationships between L1 and L2 utterance fluency measures moderated by cognitive fluency scores?
Our precursor study (Suzuki & Kormos, 2023) examined what linguistic knowledge (i.e., CF) contributes to three dimensions of UF (i.e., speed, breakdown, and repair fluency) using L2 speech samples elicited via four different speaking tasks including one argumentative speech task, which was also part of the current study. In contrast, the study reported in this paper examines the extent to which L2 UF measures are explained by their L1 counterparts. To this end, participants’ L1 speech data are exclusively used in the current study, and the sample of this research includes only the participants who completed two argumentative speaking tasks in L2 English and another argumentative task in L1 Japanese (for more details, see Supplementary Appendix A; Suzuki & Kormos, 2024).
Method
Participants
As part of a larger project, the current study included 104 Japanese learners of English (female = 61, male = 43;

Histogram of 61 out of 104 students’ self-reported English scores on in-house university placement tests, equated to CEFR level equivalents based on the test provider’s concordance tables.
Materials
Speaking tasks
As discussed earlier, the association between L1 and L2 speech production may be stronger in open-ended speaking tasks than in closed ones. In open-ended tasks, speakers have considerable flexibility in adjusting their message to their own L2 linguistic resources. The effect of personal speaking style might thus be apparent in their L2 performance in this type of task. Accordingly, the current study elicited L1 and L2 monologue speech samples, using different argumentative speech tasks with an identical task format and procedure. All prompts are available on the Open Science Framework (OSF; Suzuki & Kormos, 2024). In the L1 argumentative speech task, participants were provided with a statement—
Cognitive fluency scores
To operationalise L2 proficiency in a way that it is relevant for oral fluency, the current study adopted the L2-based measurements of CF and computed two separate factor scores, that is,
L2 utterance fluency measures
Following prior research into L2 UF, the current study targeted three major aspects of UF—speed fluency, breakdown fluency, and repair fluency (Tavakoli & Skehan, 2005). As for speed fluency, there is one measure that solely taps into the construct of speed fluency, that is,
Speed fluency
Breakdown fluency
Repair fluency
Composite measures
The speech data were transcribed and then annotated for the boundaries of clauses. The measures of breakdown and repair fluency were standardised by the number of syllables produced in pruned transcripts (i.e., excluding disfluency words) rather than by speech duration to reduce the possible collinearity with other UF measures (see Bosker et al., 2013; Suzuki & Révész, 2023). For the measure of speed fluency and the composite measures, the number of syllables produced in pruned transcripts was used in the calculation. To annotate temporal features,
L1 utterance fluency measures
This study calculated the same set of L1 UF measures as the L2 measures listed in the previous section. Considering the syllable structure and phonological properties of the Japanese language, we employed a mora as the standardised unit for the calculation of L1 Japanese UF measures. Note that a mora is fundamentally shorter than English syllables because the basic structure of morae allows only one consonant at the position of onset of the syllable (Vance, 2008), which should be carefully considered when comparing UF measures between different languages.
Procedures
All the CF tasks were administered in one session except for the controlled speaking task for measuring articulatory speed. Approximately 1 week later, all participants performed a total of five spontaneous speaking tasks, including the current argumentative tasks and the controlled speaking task in L2 English. They then completed the L1 argumentative speaking task in another session. The order of the two L2 argumentative tasks was counterbalanced across participants. In both L1 and L2 argumentative tasks, 3 minutes were provided for pre-task planning. During the planning time, note-taking was not allowed for any of the tasks (for details, see Suzuki & Kormos, 2023).
Statistical analysis
As a preliminary analysis, descriptive statistics and correlational analyses were performed to examine the distributions of all the variables and the interrelationships among them (for all statistics, see Supplementary Appendix B; Suzuki & Kormos, 2024). The descriptive statistics and Shapiro–Wilk tests suggested that most of the UF measures were not normally distributed, whereas the density plots indicated that articulation rate can be regarded as being normally distributed. In the subsequent generalised linear mixed-effects models (GLMMs), the Gaussian distribution (i.e., normal distribution) was thus applied to the models of articulation rate, whereas the gamma distribution—one of the continuous probability distributions where a possible range of values is from zero to +∞ (Coupé, 2018)—was applied to the models of the other UF measures with the log link function.
Correlational analyses were also conducted between L1 UF measures and L2 counterparts in the two prompts. Considering the non-normal distributions of most of the UF measures, we employed Spearman’s rank-order correlation coefficients. The correlational pattern between L1 and L2 UF performance did not substantially differ between the two topics of the L2 argumentative tasks (see Supplementary Appendix B; Suzuki & Kormos, 2024). The subsequent GLMMs predicting L2 UF measures from the corresponding L1 UF measures, therefore, included both of the L2 argumentative tasks and handled the topics as the random-effects variable to control for the variability of the L1-L2 UF link across topics.
For GLMMs with the gamma distribution, non-positive values (basically zero values in the current dataset) may prevent the estimation of statistical models. The zero values were thus replaced with the −3
To examine the overall associations between L1 and L2 UF measures (RQ1), GLMMs were constructed to predict L2 UF measures from their L1 counterparts with random intercepts of individual participants and topics of the L2 argumentative tasks. To investigate the moderator effects of L2 proficiency on the L1-L2 UF link (RQ2), two interaction terms by L1 UF and each of the CF factor scores (LR and PS) were added to the GLMMs constructed for RQ1. To control for the simple effects of the predictor variables (i.e., L1, LR, and PS) on L2 UF, the GLMMs for RQ2 included these predictors as well. From a statistical perspective, RQ2 addresses whether those two interaction terms (i.e., by L1 and LR factor score; and by L1 and PS factor score) are significant in a confirmatory manner. Accordingly, to avoid overly complex models with many predictor variables, the GLMMs for RQ2 did not include the two-way interaction between LR and PS and the random-slopes of the individual participants and topics for the target interaction effects (for details of model building procedures, see Supplementary Appendix B; Suzuki & Kormos, 2024). The planned models for RQ1 and RQ2 are as follows.
RQ1
RQ2
Results
Predictive power of L1 utterance fluency in L2 utterance fluency
To examine the predictive power of L1 UF performance in L2 counterparts (RQ1), a set of GLMMs was constructed. For all GLMMs, the outcome variable was L2 UF measures, and the fixed-effects predictor variable was the L1 counterparts, with the random intercepts of Participants and Topics of the L2 argumentative tasks. As summarised in Table 2, the GLMMs suggest that all L2 UF measures were significantly predicted from the corresponding L1 UF measures. However, the amount of variance of the L2 UF measures explained by the corresponding L1 UF measures varied considerably across the UF measures. The marginal
Summary of the effects of L1 utterance fluency measures on the corresponding L2 utterance fluency measures.
Moderator effects of L2 proficiency on L1-L2 utterance fluency link
RQ2 addresses whether the predictive power of L1 UF for L2 UF is moderated by the speaker’s L2 proficiency. To operationalise L2 proficiency in the context of L2 fluency research, the factor scores of CF were estimated based on a series of CF measurements. Two interaction terms by L1 UF and each of the CF factor scores (LR and PS) were then added to the GLMMs constructed for RQ1. For all GLMMs for RQ2, the structure of random-effects variables was identical to the one constructed for RQ1. As summarised in Table 3, the significant moderator effects of L2 CF on the L1-L2 UF link were found only in speed fluency and composite measures—articulation rate, speech rate, and mean length of run (for full statistical estimates, see Supplementary Appendix B; Suzuki & Kormos, 2024). More specifically, the L1-L2 association in articulation rate was weakened by the score of LR and was also enhanced by the score of PS. In other words, for those who acquired a wider range of L2 LRs, L2 articulation rate tended to be relatively independent of L1 articulation rate (see Figure 2). In contrast, the L1-L2 association in articulation rate, speech rate, and mean length of run was stronger for those who have faster L2 processing skills (see Figures 3–5). This consistent pattern in speed fluency measures suggests that the more efficiently learners can process L2 knowledge, the closer to L1 their L2 speed fluency is.
Summary of the interaction effects by L1 utterance fluency measures and linguistic resource and processing speed on the corresponding L2 utterance fluency measures.

Interaction plot of the relationship between L1 and L2 articulation rate measures, separated by L2 linguistic resource scores.

Interaction plot of the relationship between L1 and L2 articulation rate measures, separated by L2 processing speed scores.

Interaction plot of the relationship between L1 and L2 speech rate measures, separated by L2 processing speed scores.

Interaction plot of the relationship between L1 and L2 mean length of run measures, separated by L2 processing speed scores.
Discussion
Motivated by the lack of studies on the L1-L2 UF link in the pair of mora-timed and stress-timed languages as well as those using open-ended speaking tasks, the current study investigated the extent to which L2 UF measures can be predicted from the corresponding L1 measures (RQ1), using L1 and L2 argumentative speech data produced by Japanese-speaking learners of English. Using the factor scores of CF as the measures of L2 proficiency, the study also examined whether L2 CF significantly moderate the predictive power of L1 UF measures in L2 UF measures (RQ2).
Predicting L2 utterance fluency from L1 utterance fluency
Our analysis detected significant associations between L1 and L2 UF for all of the investigated UF variables. In addition, the results showed small-to-medium effect sizes for the overall predictive power of L1 UF measures in their L2 counterparts while negligible effect sizes were found in several measures (marginal
Small-to-medium effect sizes of the L1-L2 UF association were found for mean length of run, filled pause ratio, and self-repetition ratio. As regards mean length of run, each run is segmented by pauses. Previous studies have shown that L1 speakers produce pauses at clause boundaries more frequently than in the middle of utterances (De Jong, 2016), which suggests that breakdowns in L1 speech production are mainly caused by content-related processing that end-clause pauses are supposed to reflect. Therefore, especially for those who have attained highly automatised linguistic knowledge (e.g., L1 speakers and advanced L2 learners), the length of run might be reflective of the unit of conceptual planning (Kormos, 2006; Levelt, 1989). Considering the language-general nature of conceptualisation processes, the L1-L2 link in mean length of run may indicate the extent to which the speakers’ capacity and efficiency in content planning are shared across L1 and L2 speech.
Previous studies have commonly reported moderate-to-strong effect sizes of the L1-L2 UF link in filled pause frequency (De Jong et al., 2015; Duran-Karaoz & Tavakoli, 2020; Peltonen, 2018) and self-repetition frequency (De Jong et al., 2015; Huensch & Tracy-Ventura, 2017). The current results also confirmed the small-to-medium effect sizes for these measures, despite the cross-linguistically divergent pair of L1 and L2 (i.e., Japanese-speaking learners of English). It may thus be argued that these disfluency features are reflective of individual speaking style in a relatively robust manner. This argument may also be supported by the relatively large amount of variance explained by the random-intercepts of individual participants (see Table 2). In addition, filled pauses can be associated with the demands on content planning (Fraundorf & Watson, 2014). Provided that the L1-L2 UF link can reflect speakers’ language-general processing or idiosyncratic factors, the current finding may strengthen the evidence that speakers tend to elaborate speech content similarly in L1 and L2 speech production.
The current study also found a small effect size of the L1-L2 link in the measures of silent pause frequency and duration, regardless of pause location. Our findings were surprising because conceptualisation-related processes, which end-clause pauses are supposed to reflect (De Jong, 2016), are theoretically assumed to be shared across L1 and L2 speech production. Therefore, one would have anticipated that end-clause pause measures would show a strong relationship in L1 and L2 production. The relatively small effect sizes of end-clause pause measures in the current study might be explained by participants’ proficiency range, including variability in the effective use of planning before starting to speak (i.e., scope of planning; Gilbert et al., 2020). As pauses at clausal boundaries can be regarded as the starting point of planning for the subsequent unit of ideas (cf. Foster et al., 2000), end-clause pauses may be reflective of the scope of planning. A variety of individual difference factors associated with L2 proficiency (e.g., self-perceived proficiency, cumulative exposure to L2) were found to affect how far ahead L2 learners plan their utterances before speaking (see Gilbert et al., 2020). Some students may have prepared not only for content planning but also for some linguistic planning (e.g., vocabulary and pronunciation) at clausal boundaries. Consequently, such individual variability in L2-specific speech planning at clausal boundaries might have reduced the covariance of end-clause pause measures between L1 and L2 speech in the current study.
In contrast, the L1-L2 link in mid-clause pauses was expected to have a small or even negligible effect size because of their association with language-specific linguistic processing (for a similar argument, see Peltonen, 2018). However, the current results showed that mid-clause pause frequency and duration in L1 speech were weakly but significantly associated with their counterparts in L2 speech. This is possibly due to the relatively difficult and abstract topics of the argumentative tasks for both L1 and L2 speech production. Even in the L1 argumentative task, infrequent and sophisticated vocabulary items may have been needed to complete the task. It can thus be assumed that such high demands on lexical retrieval in L1 speech might have contributed to the similarity between L1 and L2 speech production, that is, the covariance between L1 and L2 mid-clause pause measures.
Furthermore, the current results revealed that there were no meaningful L1-L2 UF associations in articulation rate, self-correction ratio, and false start ratio. Regarding articulation rate, previous studies commonly reported moderate-to-strong effect sizes of correlation coefficients between the measures in L1 and L2 speech (De Jong et al., 2015; De Jong & Mora, 2019; Huensch & Tracy-Ventura, 2017) as well as their cross-linguistic robustness (Bradlow et al., 2017). However, the current study found negligible predictive power of L1 articulation rate for its L2 counterpart. One possible reason for this may lie in the cross-linguistic difference between the mora-timed L1 (Japanese) and stress-timed target L2 (English) of the participants. The maintenance of isochronous rhythm in Japanese is achieved by every single mora rather than stressed syllables, as opposed to the stress-timed feature of English. The cross-linguistic differences in rhythmic aspects are thus relatively divergent between Japanese and English (e.g., vowel reduction ratio, varying syllable length; cf. (Pellegrino et al., 2011; Vance, 2008), compared with the pair of syllable-based languages in previous studies. It can thus be assumed that Japanese-speaking learners of English may have a limited range of rhythmic features that can be transferred from L1 Japanese to L2 English speech.
Although the L1-L2 UF link in false starts has rarely been examined, previous studies have reported a significant predictive power of L1 self-correction behaviour on its L2 counterpart (De Jong et al., 2015; Huensch & Tracy-Ventura, 2017). Albeit speculative, one possible explanation for the inconsistent results between previous studies and the current study may lie in the cross-linguistic and/or cross-cultural differences in the norms for self-repairs in spontaneous speech (see Tavakoli & Wright, 2020). The descriptive statistics of self-correction ratio and false start ratio indicated that the participants in the current study produced a limited number of self-corrections and false starts, compared with previous studies (see De Jong et al., 2015; Huensch & Tracy-Ventura, 2017). It can thus be hypothesised that L1 Japanese speakers may aim to avoid self-repair in monologues regardless of the language for production (for the descriptive statistics of the current data, see Supplementary Appendix B; Suzuki & Kormos, 2024). However, only a few studies have investigated the cross-linguistic and cross-cultural differences in the norm of temporal features in speech (e.g., Tian et al., 2017, for filled pauses). Therefore, future studies are needed to examine the cross-linguistic and cross-cultural influences on the L1-L2 UF link, especially in repair fluency features.
The moderating role of L2 cognitive fluency in L1-L2 fluency link
In addition to cross-linguistic effects, L2 proficiency has been regarded as another important characteristic that can influence the relationship between L1 and L2 UF. Given the high predictive power to UF performance (De Jong et al., 2013; Kahng, 2020; Suzuki & Kormos, 2023), the current study adopted two factor scores of CF—LR and PS—as a proxy for L2 proficiency. A set of GLMMs predicting L2 UF from the L1 counterparts, these two CF scores, and their interactions with L1 UF measures suggested that the L1-L2 UF link was moderated by the CF scores only in articulation rate, speech rate, and mean length of run. More specifically, the L1-L2 link in articulation rate was weaker as the score of LR increased. In other words, for those who acquired a wider range of L2 knowledge, L2 articulation rate tended to be relatively independent of L1 articulation rate. Note that despite the marginally significant level, the score of LR positively contributed to articulation rate (β = .122,
In contrast, the L1-L2 link in articulation rate, speech rate, and mean length of run was enhanced as a function of L2 PS scores. Note that both speech rate and mean length of run tap into the aspects of speed fluency to a large extent. This consistent pattern in these measures thus suggests that the more efficiently learners can process L2 knowledge, the closer their L2 speed fluency is to that of their L1. This finding is in line with Huensch and Tracy-Ventura’s (2017) and Peltonen’s (2018) results. In the current study, students’ speech was more fluent in L1 than in L2 (for the results of statistical testing, see Supplementary Appendix B; Suzuki & Kormos, 2024). The results suggest that efficient L2 processing skills may help learners to reduce the gap between their L1 and L2 fluency. Note that the direction of moderator effects of L2 proficiency scores on the L1-L2 UF link was opposite between the scores of LR and PS. These opposite patterns of the moderating role of L2 proficiency may expand the understanding of the interplay between cross-linguistic differences and L2 proficiency in the L1-L2 UF link (Derwing et al., 2009; Huensch & Tracy-Ventura, 2017).
In contrast to the measures related to speed fluency, the GLMM did not show significant moderator effects of L2 CF on the L1-L2 UF link in any of the breakdown and repair fluency measures. Hence, the current results indicate that L2 breakdown and repair fluency measures are associated with their L1 counterparts, and those associations tend to be independent of L2 speech-specific competence (cf. Gao & Sun, 2024). This finding regarding the L1-L2 UF link may support the use of L1-corrected L2 UF measures of breakdown and repair fluency for a more valid assessment of L2 oral proficiency (De Jong et al., 2015). However, it is worth emphasising that different weights of correction might be needed for speed fluency measures and composite measures across proficiency levels, as suggested by the interaction effects by L1 UF measures and L2 proficiency scores (for a detailed discussion, see De Jong, 2018).
Conclusion
Motivated by the scarcity of studies on the L1-L2 UF link in the pair of mora-timed and stress-timed languages as well as the predominant use of closed speaking tasks, the current study examined the association between L1 and L2 UF performance, using L1 and L2 argumentative speech data elicited from Japanese-speaking learners of English. The study also aimed to shed light on the moderating role of L2 proficiency in the L1-L2 UF link. The GLMMs revealed that all L2 UF measures were predicted by their L1 counterparts, generally with small-to-medium effect sizes in terms of the variance explained (Marginal
Although the current study aimed to provide a better understanding of the construct of fluency, which is a key component of high-stakes L2 speaking assessment, the results also have implications for diagnostic and formative assessment of speaking skills. In most assessment contexts, the contributions of L1 speaking style to L2 fluency can be regarded as construct irrelevant variance (De Jong, 2018). To achieve the effective diagnosis of learners’ challenges regarding L2 fluency, for instance, classroom teachers may consider students’ L1 oral performance as another resource for individualised feedback. To offer more practical suggestions for the summative assessment of L2 speaking performance, follow-up studies could collect listener-based judgements of fluency (i.e., perceived fluency; Segalowitz, 2010) and examine the extent to which and how raters detect and might be influenced by temporal features that are primarily related to L1 fluency. If such rater biases are observed, sensitivity to features of L1 fluency can be included in rater training.
Several methodological limitations of the current study need to be acknowledged. Although the current study statistically controlled for the effects of the topic of L2 argumentative tasks on the prediction of L2 UF measures from the L1 counterparts, the results are limited to one single task type, an argumentative task. Similarly, the topic of the argumentative tasks was not counterbalanced between L1 and L2 speech. Accordingly, the differences in UF performance between L1 and L2, at least to some extent, might have been subsumed under the topic effects. In addition, following previous studies (De Jong et al., 2015; Duran-Karaoz & Tavakoli, 2020; Peltonen, 2018) and L2 speech production models (de Bot, 1992; Kormos, 2006; Segalowitz, 2010), the current study assumed that the covariance between L1 and L2 UF measures reflects language-general processes and idiosyncratic factors shared across L1 and L2 speech production. For a better understanding of the L1-L2 UF link, however, this assumption should also be further validated to clarify what underlies the covariance between L1 and L2 UF measures, by correlating it with language-general individual difference factors, such as working memory capacity and personality-related variables (e.g., Gagné et al., 2022).
