Abstract
Introduction
The globalization of communication has intensified the demand for skilled translators capable of producing culturally nuanced work (Dung, 2024). In response, translator education has increasingly embraced data-driven learning (DDL), an inductive approach where learners derive linguistic patterns from analyzing authentic language corpora. Despite its pedagogical value, the implementation of DDL faces significant constraints in many contexts, including China. These constraints include teacher-centered pedagogies, a scarcity of high-quality bilingual corpora, and an assessment culture favoring rote memorization over exploratory learning (Chung et al., 2024; Lusta et al., 2023). Consequently, students often rely on outdated textbook examples and static online dictionaries rather than engaging with the dynamic, real-world language data that defines modern translation work (Abou-Khalil et al., 2021).
To overcome these logistical barriers, Generative artificial intelligence (AI) presents a paradigm-shifting opportunity. Tools like DeepSeek can dynamically generate tailored, domain-specific parallel texts, offering a potential solution to the corpus scarcity that has long hampered DDL (Crosthwaite & Baisa, 2023). Theoretically, this aligns with two critical frameworks. The PACTE Group’s (2003) Translatorial Competence Model highlights the need to develop instrumental sub-competence and strategic sub-competence. Concurrently, Cognitive Load Theory (CLT; Sweller, 2010) provides a lens for understanding how AI could optimize learning. By automating the manual processes of traditional DDL, AI may reduce extraneous cognitive load, freeing mental resources for the germane load required to build strategic translation competence.
However, the integration of generative AI into pedagogy is not inherently beneficial; its efficacy is highly conditional and raises critical concerns. First, AI models are prone to generating errors and are often trained on Western-centric data, risking the perpetuation of cultural biases and the production of inaccurate terminology (Hwang & Chang, 2023; Yang & Wang, 2023). Beyond these questions of pure efficacy, a second and central concern is that the benefits of AI may be inequitably distributed. Pre-existing socioeconomic disparities, particularly the pronounced rural-urban digital divide, threaten to mediate AI’s effectiveness, potentially exacerbating educational inequalities rather than alleviating them (Zhang & Li, 2021). From a CLT perspective, for disadvantaged students, the tool intended to reduce cognitive load could instead become a new source of it. Third, an over-reliance on automated outputs could potentially undermine the development of the critical, strategic decision-making that defines expert translation (PACTE Group, 2003).
These challenges are acutely visible within China’s unique educational landscape, which is characterized by top-down “AI + Education” initiatives (Knox, 2020) yet also shaped by Confucian-heritage pedagogical traditions (Tan, 2015) that may conflict with the learner-centered ethos of AI-enhanced DDL. Consequently, while the potential is significant, empirical research on AI’s role in translator education remains dominated by Western contexts (Bo et al., 2025), leaving a critical gap in understanding its efficacy, equity implications, and impact on learner engagement in non-Western settings.
As such, this study argues that the efficacy of generative AI in DDL is not automatic but is fundamentally mediated by socioeconomic factors and pedagogical design. Therefore, our primary research question is not merely if AI-enhanced DDL works, but for whom and under what conditions it works. Grounded in the PACTE and CLT frameworks, this mixed-methods study investigates the integration of generative AI into a Chinese undergraduate translation program to address the following questions:
1. What is the comparative impact of AI-enhanced DDL versus traditional methods on developing translation competence?
2. How do socioeconomic factors, specifically the rural-urban divide, mediate the efficacy of AI-enhanced DDL?
3. How do learners perceive and engage with AI-enhanced DDL in terms of motivation and critical autonomy?
Literature Review
The integration of generative AI into DDL for translator education represents a convergence of technological innovation, pedagogical theory, and sociocultural dynamics. To contextualize this study’s focus on efficacy, equity, and engagement, this literature review examines three interrelated domains: (1) the theoretical foundations of DDL and translatorial competence, (2) the emerging role of AI in translator education, and (3) equity challenges in technology-enhanced language learning, with a focus on China’s educational landscape. By synthesizing these strands, this section establishes a framework for evaluating how generative AI can reshape DDL practices while addressing the unique demands of translator training in non-Western contexts.
DDL and Translatorial Competence
DDL, rooted in corpus linguistics, emphasizes inductive learning through direct engagement with authentic language data (cf. Lusta et al., 2023). In translator education, DDL enables learners to analyze parallel corpora (McGuire, 2019)—collections of source texts and their translated counterparts—to identify patterns in terminology, collocations, and discourse conventions. This approach aligns with the Translatorial Competence Model proposed by the PACTE Group (2003), which defines translation competence as a multicomponential construct comprising bilingual, extralinguistic, instrumental, strategic, and cultural sub-competences. Of particular relevance to DDL is the instrumental sub-competence, which involves the ability to use tools and resources—such as corpora, terminology databases, and translation memory systems—to solve translation problems. By engaging with corpora, students develop equivalents, and adhere to genre-specific conventions (Farroni, 2024).
However, traditional DDL faces limitations in translation classrooms. Static corpora often lack domain-specific or culturally nuanced texts (Zhang et al., 2024), particularly for language pairs involving less-resourced languages. Moreover, the manual compilation and annotation of corpora demand significant time and expertise, rendering them inaccessible to many educators. Generative AI tools like DeepSeek offer a potential solution by automating the creation of tailored bilingual texts (Woo et al., 2024). For instance, an instructor could prompt DeepSeek to generate parallel texts on specialized topics or simulate client briefs with specific stylistic requirements. This capability aligns with the strategic sub-competence outlined in the PACTE model, which emphasizes problem-solving and decision-making in real-world scenarios. Yet, the pedagogical efficacy of AI-generated corpora remains under-researched (Pack & Maloney, 2024), particularly in relation to their impact on translation accuracy and cultural appropriateness.
Generative AI in Translator Education
Recent advances in generative AI have spurred interest in its applications for translator training. Unlike rule-based machine translation systems, generative AI models like DeepSeek produce fluid, context-aware outputs that mimic human language use, making them valuable for simulating authentic translation tasks (Hu et al., 2024). Studies in Western contexts suggest that AI can enhance DDL by providing instant feedback, generating practice materials, and fostering learner autonomy (e.g., Mizumoto, 2023). For instance, students might use DeepSeek to extract collocations from AI-generated texts or compare multiple translation variants to identify optimal solutions.
Such activities must be examined through the lens of CLT (Sweller, 2010), which provides a crucial framework for understanding the mental architecture involved in learning complex skills like translation. CLT distinguishes between three types of cognitive load: intrinsic (the inherent difficulty of the material, e.g., translating a complex legal text), extraneous (load imposed by the instructional design or tool interface), and germane (load devoted to schema construction and automation). Effective learning occurs when instructional design minimizes extraneous load and optimizes germane load (Sweller, 2010) to manage intrinsic load.
The integration of AI into DDL presents a dual potential from a CLT perspective. On one hand, by automating the laborious process of corpus compilation and providing immediate, contextualized examples, AI can significantly reduce extraneous cognitive load (Li, 2024). Students are freed from the technical hurdles of building and querying complex databases, allowing them to dedicate more cognitive resources to the intrinsic challenge of comparative text analysis and strategic decision-making (Al-Obaydi & Pikhart, 2020). This aligns with studies suggesting that well-designed tools can offload routine tasks (Moreno & Park, 2010), facilitating deeper engagement with core concepts.
On the other hand, poorly integrated AI can increase extraneous load (Lampou, 2023). For novice learners or those with low digital literacy, the process of formulating effective prompts, interpreting potentially flawed or biased AI outputs, and troubleshooting the technology itself can create a new source of cognitive overhead (Walter, 2024). This risk is particularly acute in contexts with a pronounced digital divide, where unfamiliarity with the tool could overwhelm the cognitive benefits it offers. Therefore, the central pedagogical challenge is not merely to introduce AI, but to design scaffolding that ensures it functions as a load-reducing facilitator of germane processing rather than a load-inducing obstacle.
However, the existing literature, while identifying potential, suffers from a lack of critical empirical investigation into these concerns. Many studies champion AI’s potential but fail to adequately test its limitations under real-world pedagogical conditions (cf. Díaz & Nussbaum, 2024). First, the probabilistic nature of generative AI means its outputs can contain errors—including factual inaccuracies, terminological inconsistencies, and stylistic infelicities—that require critical evaluation by the user. This is compounded by the models’ training on vast datasets dominated by English and other high-resource languages, which embeds Western-centric biases and can increase error frequency for certain language pairs or domains (Singh, 2024). For Chinese translator students, this raises the risk of internalizing inaccurate or culturally inappropriate equivalents, particularly when translating culturally specific concepts (e.g., traditional Chinese medicine terms). Second, over-reliance on AI-generated content could undermine the development of strategic sub-competence, as students may prioritize algorithmic suggestions over critical analysis (Zábojník & Hromada, 2024; Zhai et al., 2024). Third, the “black box” nature of AI systems obscures the decision-making processes behind their outputs, complicating efforts to teach students how to evaluate and justify their translation choices (Franzoni, 2023). These challenges underscore the need for pedagogical frameworks that leverage AI’s strengths while mitigating its limitations.
Equity in AI-Enhanced Translator Education in China
The digital divide is a global challenge, but its manifestations are locally specific (Afzal et al., 2023; Clark & Gorski, 2001). China presents a compelling case study due to the scale and policy context of its rural-urban disparities. The integration of AI into education is often framed as a democratizing force (Prinsloo & Khalil, 2024), yet its benefits are unevenly distributed. In China, where socioeconomic disparities between urban and rural regions persist, access to technology and digital literacy skills vary significantly (Song, 2023). Urban universities in cities like Beijing or Shanghai often boast advanced infrastructure and partnerships with tech firms, enabling students to engage with cutting-edge AI tools. In contrast, rural institutions may lack reliable internet access or funding for AI training programs, exacerbating existing inequities in translator education. This divide aligns with Afzal et al. (2023) concept of the digital divide, which encompasses not only physical access to technology but also the skills and cultural capital required to use it effectively—a challenge common to many education systems.
Within translation classrooms, equity issues manifest in multiple ways. Students from rural backgrounds may struggle with the technical demands of AI tools, such as formulating effective prompts for DeepSeek or interpreting its outputs (Vandenberg et al., 2023; Walter, 2024). Additionally, generative AI’s reliance on English-centric training data may disadvantage students with weaker English proficiency (Kshetri, 2024), as they might misinterpret AI-generated suggestions or fail to recognize cultural biases. These challenges are compounded by China’s Confucian-heritage educational culture (You, 2018), which traditionally prioritizes teacher authority and standardized assessments over exploratory, learner-centered approaches. While AI-enhanced DDL has the potential to shift this dynamic by fostering autonomy, it also risks marginalizing students who lack the confidence or resources to navigate self-directed learning environments.
To address these inequities, emerging studies highlight strategies (e.g., Crosthwaite & Steeples, 2022; Woo et al., 2024). Structured training programs that scaffold AI literacy—such as workshops on prompt engineering or critical evaluation of AI outputs—have been shown to reduce performance gaps between students with varying levels of prior experience (Walter, 2024). Furthermore, institutional policies that mandate equitable access to technology, such as subsidized software licenses or device-sharing initiatives, can mitigate hardware-related barriers (Gonzales et al., 2018). However, few studies have examined how these strategies apply to translator education, particularly in contexts where linguistic and cultural specificity are paramount.
Gap Synthesis and Present Study
This review synthesizes three critical and interconnected lacunae that this study is designed to address, moving the discourse from mere application to critical expansion.
First, a pronounced contextual gap exists. Empirical research on AI-enhanced DDL remains predominantly situated in Western, high-resource contexts, which assume norms of digital access and learner autonomy that are not universal. The efficacy of these approaches in non-Western settings, particularly within China’s socio-educational landscape, remains severely underexplored.
Second, and most critically, there is a significant equity-mediated efficacy gap. While the digital divide is often acknowledged abstractly, there is a scarcity of empirical research investigating how specific socioeconomic factors (e.g., the rural-urban divide) directly mediate learning outcomes. The literature frequently champions AI’s potential without critically examining its power to exacerbate existing inequities.
Third, a theoretical-pedagogical gap persists. While CLT is occasionally referenced, its specific application to the cognitive architecture of AI-augmented translation tasks is underdeveloped. Furthermore, discussions of engagement often overlook the critical competence required to navigate AI’s inherent cultural biases—a skill paramount to translation.
Consequently, this study does not merely transpose a Western model but seeks to critically expand the discourse. It investigates the integration of generative AI into DDL in China to provide a nuanced framework for evaluating how AI can be harnessed to bridge—rather than widen—existing gaps.
Methodology
Participants
The study involved 80 undergraduate translation majors (Female = 62; Male = 18; Mean age = 20.4,
Participants were recruited from second—(
Stratified random sampling was employed to divide participants into experimental and control groups (40 students each). Stratification criteria included urban/rural residency (Urban = 47; Rural = 33), prior exposure to AI tools (assessed via a pre-intervention survey: Novice = 32, Intermediate = 35, Advanced = 13), and academic performance (GPA from the previous semester, divided into three tiers: top 30% [GPA ≥ 3.62], middle 40% [3.20 ≤ GPA < 3.62], and bottom 30% [GPA < 3.20];
Participant Demographics and Baseline Characteristics.
Intervention
The 12-week intervention integrated the generative AI tool DeepSeek (DeepSeek-Coder 1.0 via API) into DDL activities for the experimental group, while the control group utilized traditional static corpora. To ensure standardization, a fixed prompt template was used for all AI-generated materials: “Generate a parallel English-Chinese text of approximately (
Both groups attended weekly 90-min translation workshops focused on specialized domains (legal translation), aligned with curriculum requirements. All instructors (
For the experimental group, AI-enhanced DDL tasks were structured around three pillars, with extensive pedagogical scaffolding: (1) Guided Corpus Generation and Critical Evaluation: Students were trained from the outset that AI outputs are generative suggestions, not authoritative answers. They learned to generate tailored corpora using structured prompts and were explicitly trained to critically evaluate outputs for factual errors, terminological imprecision, and cultural biases. For example, a core task required students to generate a parallel text on commercial liability using the prompt: “Act as a legal translator. Generate a parallel English-Chinese text of approximately 250 words on the topic of ‘liability in commercial partnerships’. Ensure the text includes the term ‘joint and several liability’ and 7 to 9 other relevant legal terms.” (2) Pattern Identification and Analysis: Learners extracted terminology, collocations, and discourse patterns from these AI-generated texts using concordance lines. (3) Translation and Iterative Refinement: Students translated AI-generated client briefs, using the AI as an analytical aid for brainstorming and verification rather than for direct translation, followed by peer feedback loops.
The control group employed static corpora from established resources such as the Bilingual Corpus of Chinese Classics (BCCC). Tasks mirrored those of the experimental group in domain and difficulty but relied on pre-existing, pre-compiled corpora.
Both groups participated in identical pre-/post-tests. All assessments were scored by two independent raters using a validated rubric. Inter-rater reliability was high (Cohen’s κ = .89); discrepancies were resolved through discussion to reach a consensus.
Data Collection
Data were collected through a triangulated approach, integrating quantitative proficiency metrics, self-reported engagement surveys, and qualitative reflections on equity and learning experiences.
Quantitative data included pre-/post-tests and engagement surveys. Participants completed translation tasks in legal, technical, and literary domains, which were scored using a rubric evaluating accuracy, terminology consistency, and cultural appropriateness. Rubric validity was established through pilot testing and inter-rater reliability (Cohen’s κ = .85). Engagement surveys consisted of a 20-item Likert-scale survey adapted from the Motivated Strategies for Learning Questionnaire (Rotgans & Schmidt, 2010), measuring motivation, self-efficacy, and autonomy. Items included statements such as, “I felt confident using AI tools to solve translation problems,” rated from 1 (
Qualitative data were gathered through reflective journals, focus groups, and classroom observations. Students maintained weekly journals documenting challenges and successes. Four semi-structured focus groups (two per cohort) explored equity themes, with prompts such as, “Did your background affect your ability to use AI tools?” Sessions were audio-recorded and transcribed. Researchers observed eight classroom sessions (four per group), noting patterns in tool usage, peer collaboration, and instructor-student interactions.
Data Analysis
Quantitative and qualitative data were analyzed separately before integration to address the research questions.
Quantitative analysis involved ANCOVA to compare post-test scores between groups, controlling for pre-test performance, urban/rural residency, and prior AI experience. Multiple regression analysis identified variables influencing outcomes, such as access to high-speed internet or participation in AI training workshops. Descriptive statistics and t-tests compared survey responses between groups.
Qualitative data underwent thematic analysis following Braun and Clarke’s (2006) six-phase approach using NVivo 12 software. This process involved a thorough familiarization with the data from reflective journals and focus group transcripts. Initial codes (e.g., “AI dependency,”“cultural bias”) were generated and subsequently clustered into broader themes (e.g., “tool reliability,”“equity barriers”). To ensure consistency, two researchers independently coded 20% of the transcripts (initial Cohen’s κ = .78); discrepancies were resolved through discussion until a consensus was reached (final κ = 1.0), and the refined codebook was applied to the remaining data. Finally, observational notes were analyzed to identify behavioral patterns (e.g., “frequent tool troubleshooting,”“peer mentoring”); these patterns were used to triangulate and contextualize the findings from the surveys and journals.
Ethical Considerations
The study adhered to ethical guidelines for educational research. Participants provided informed consent, with explicit clarification that AI interactions would be anonymized and data stored securely. To address algorithmic bias, students participated in a workshop critiquing DeepSeek’s cultural limitations, such as its tendency to prioritize Western legal terminology over Chinese equivalents. Compliance with China’s Personal Information Protection Law (PIPL) ensured data privacy, particularly for rural students with limited digital literacy.
Results
Efficacy: AI-Enhanced DDL Fosters Critical Triangulation and Reveals Domain-Specific Limits
Quantitative analysis revealed significant differences in translation proficiency between the experimental (AI-enhanced DDL) and control (traditional DDL) groups (Table 2). ANCOVA results, controlling for pre-test scores, demonstrated that the experimental group outperformed the control group in post-test assessments across all three domains. Table 2 presents the pre-/post-test differences in translation competence by group, showing moderate to large effect sizes, particularly in the legal and technical domains.
Pre-/Post-Test Differences in Translation Competence by Group.
Breakdowns of rubric criteria further clarified these gains (Figure 1). In accuracy, the experimental group showed a 23% improvement, compared to 14% in the control group. For terminology consistency, AI-supported students achieved a 28% increase, versus 17% for the control group. The most notable divergence emerged in cultural appropriateness: experimental group translations exhibited a 19% enhancement, while the control group improved by only 9%. (Anomaly Handling: Prior to analysis, two outliers were identified in the control group’s post-test literary translation scores via

Rubric breakdown of pre- and post-test.
Qualitative data revealed that the efficacy of AI-enhanced DDL was not merely about faster access to information, but the fostering of a more sophisticated research methodology. Students developed a practice we term “critical triangulation”—using AI as a powerful starting point for a rigorous validation process against authoritative human sources. For example, Student #22 (Urban, High-Proficiency) documented a detailed process for translating the term “fiduciary duty” that exemplifies this new skill: DeepSeek’s first suggestion was “
This quote demonstrates a paradigm shift in instrumental sub-competence. Students were no longer passive recipients of information from static tools but active managers of an AI-human verification workflow, leading to deeper conceptual understanding rather than just terminological acquisition.
Interestingly, a counterintuitive yet significant finding was that AI’s limitations in literary translation became powerful pedagogical moments. Students developed a keen metacognitive awareness of the AI’s weaknesses, which in turn sharpened their own strategic decision-making. This is not common knowledge; it’s a unique insight into using AI failures pedagogically. One student (Student #57) provided a detailed critique that highlights this: The AI translated ‘Hope is the thing with feathers’ as “
This case illustrates the development of strategic sub-competence—the decision to override the tool—which was sparked by the AI’s specific failures. The learning occurred not despite these failures, but because of them; they forced students to articulate the nuanced, cultural, and aesthetic criteria that define quality in literary translation.
Equity: The Rural-Urban Divide is a Critical Mediator of Efficacy
Crucially, this overall efficacy was not uniform. Regression models identified urban residency (β = .32,
The qualitative data uncovered that the equity gap was, initially, a cognitive load gap. For rural students, the AI was not a tool but a significant source of extraneous cognitive load and anxiety. This was quantitatively supported by time-on-task analysis, which showed rural students spent 37% more time troubleshooting technical issues (mean weekly troubleshooting time: rural = 12.5 min; urban = 9.1 min;
The mental effort required to understand the tool’s interface and basic logic competed for the finite cognitive resources needed for the translation task itself. Student #08’s (Rural, Novice) description of her first week is a powerful testament to this: I felt a real sense of panic. My urban classmates were instantly asking DeepSeek complex questions. I didn’t even know what to ask. My first prompt was just “Translate this sentence”. I got an answer, but it was no better than Google Translate. I felt behind and stupid. The tool felt like a barrier, not a bridge.
This narrative vividly exemplifies the CLT principle in action: the very tool designed to reduce extraneous load became its primary source for novices, creating an initial chasm in experience and self-efficacy that directly impacted learning capacity.
Yet, the study’s most significant finding was the demonstrable power of structured pedagogical intervention to bridge this gap. The journals documented a transformation that was not merely about skill acquisition but about identity and empowerment. The same Student #08 described this turning point after scaffolded training: The workshop on “prompt engineering for translation” was a turning point. We learned formulas like: [Role] + [Task] + [Context] + [Format]. For my legal text, I wrote: “Act as a legal translator. Analyze the following English clause for ambiguous terms. Suggest 3 Chinese translation options for the term ‘joint and several liability’ and recommend the best one for a PRC contract.” The difference was night and day. I wasn’t just using AI; I was commanding it. I finally felt like my classmates’ equal.
This evolution from anxiety to mastery is critical. It shows that AI literacy—specifically, the ability to formulate strategic prompts—functions as a form of cultural capital that is not innate but can be explicitly taught. This finding moves beyond identifying the problem to providing a scalable pedagogical solution, explaining the quantitative result where the rural-urban proficiency gap narrowed from 14% to 5%.
Despite the overall progress, a more nuanced and pernicious inequity persisted, revealing a challenge for AI in translator education. Students with lower English proficiency faced a double disadvantage: they lacked the prior experience with the tool and the linguistic ability to detect its subtle errors. Student #15 (Rural, Lower English Proficiency) illustrated this with a critical incident: The AI translated “the court upheld the verdict” as “
This case highlights a profound equity issue specific to translator learning with AI: the tool’s utility and reliability are contingent on the user’s pre-existing language skills. Without the ability to critically evaluate the source text, students are vulnerable to the AI’s “good-enough” outputs, potentially cementing the disadvantages of the least proficient students and creating a new, hidden layer of inequity that is harder to remediate.
Engagement: Learners Report High Motivation Amidst Critical Concerns
Quantitative surveys revealed heightened engagement in the experimental group (Figure 2). On a 5-point Likert scale, AI-supported students reported significantly higher motivation (

Students’ engagement with AI support.
Qualitative data revealed the complex nature of this engagement, characterized by a dynamic of “productive paranoia.” Students were highly motivated but channeled that energy into rigorous verification practices. Classroom observations detailed a common ritual: a student would generate a term from DeepSeek, then immediately open two other tabs—a specialized terminology database (e.g., Tmxmall) and a search engine to check usage in recent Chinese news articles. This wasn’t distrust, but a professionally relevant critical engagement.
A focus group with high-achieving students revealed a strategic approach: We don’t see DeepSeek as an answer key, one explained. It’s the most advanced member of our team, but it’s also an intern who sometimes hallucinates. It’s our job, as the project manager, to fact-check its work. That’s the real skill we’re learning.
This reframing of the translator’s role from a passive recipient of information to an active manager and auditor of AI output is a significant and novel finding that moves beyond simple metrics of motivation or anxiety.
However, engagement was tempered by critical concerns that extended beyond the tool to the institutional system. Students expressed acute anxiety about the misalignment between this new pedagogy and traditional assessment methods. Student #41 articulated a common fear: I love using this for practice, but what about the final exam? If we can’t use AI there, are we being set up to fail? Or will the exams just test our ability to use AI, which feels like a different skill altogether?
This points to a crucial, often-overlooked institutional challenge: the potential for a “pedagogy-assessment gap” that can undermine the perceived value of new skills and create learner anxiety, a finding regarding the systemic barriers to AI integration.
Discussion
Transformed Competence Tempered by Instrumental Dependence
First, the results demonstrate a significant yet contingent alignment with the PACTE model. The integration of AI did not merely augment but fundamentally reconfigured the development of instrumental sub-competence (PACTE Group, 2003), transforming it from a mechanical skill of resource location into “a dynamic process of critical validation” (Zhao et al., 2024). This stands in stark contrast to traditional DDL, where significant effort is expended on the manual compilation of corpora. Conversely, AI-enhanced DDL shifted the cognitive focus toward interrogation and evaluation, as empirically illustrated by Student #22’s meticulous translation of “fiduciary duty.” Here, the AI-generated corpora facilitated meta-cognitive validation strategies, catalyzing a critical shift from passive information consumption to active, critical investigation.
Furthermore, the observed development of strategic sub-competence emerged not from seamless AI performance, but conversely, from its failures. This presents a crucial dichotomy: the tool’s usability inherently risked promoting uncritical adoption, yet when its limitations were exposed within a pedagogical framework that explicitly framed the AI as fallible, these moments became potent catalysts for learning (Cheng, 2024; Kim, 2019 ). The conscious decision to override AI output, therefore, was not an inherent feature of the technology but was directly mediated by pedagogical intervention. This contrast underscores that the AI’s value lies not in its infallibility, but in its capacity to create opportunities for exercising human judgment (Spaulding, 2020), thereby reinforcing the indispensability of nuanced problem-solving in translation.
Second, the study offers robust empirical support for CLT as the principal mechanism behind this conditional efficacy. Generative AI tools like DeepSeek function by drastically reducing the extraneous cognitive load associated with manual corpus linguistics, quantified by an approximate 80% decrease in search time. This resource reallocation is the critical link explaining the superior performance in higher-order tasks; the experimental group’s marked improvement in cultural appropriateness scores (19% vs. 9%) provides quantitative evidence that freed cognitive resources were directed toward germane load (Sweller, 2010), such as cultural adaptation and strategic decision-making.
However, the most compelling argument derived from CLT is that this efficacy is not universal but is critically moderated by access and scaffolding. This reveals a fundamental contrast in user experience: for urban students with prior exposure, the AI immediately functioned as a load-reducing tool, amplifying their advantages. In direct contrast, for their rural peers, the novel interface initially functioned as a load-inducing barrier, potentially exacerbating existing inequities. Therefore, the ultimate measure of this method’s success is not its peak performance for advanced users, but its capacity to engineer “equity through scalable pedagogy” (Zacamy & Roschelle, 2022). The empirical finding that structured scaffolding reduced the rural-urban proficiency gap from 14% to 5% is paramount. This convergence powerfully argues for a paradigm shift away from a deficit model (which attributes outcomes to student background) and toward a model of institutional responsibility (which actively constructs equity through design). Qualitatively, this shift is embodied in the rural students’ trajectory from initial disorientation (“lost”) to proficient command, signifying the acquisition of AI literacy that levels the epistemological playing field (cf. Tenório et al., 2023).
Democratization Tempered by Digital Divides and Cultural Bias
The most critical finding, however, is that this efficacy was not a given for all students. The study starkly reveals that generative AI’s promise to democratize access to specialized translation resources is initially constrained by pre-existing socioeconomic and digital divides. For rural students, the initial encounter with AI tools often had the opposite effect on cognitive load: it dramatically increased extraneous load. The cognitive effort required to understand the tool’s interface, learn effective prompt engineering from scratch, and troubleshoot basic technical issues competed for the same finite cognitive resources needed for the translation task itself. This initial disparity, driven by gaps in internet access, device availability, and prior digital literacy (Song, 2023), vividly illustrates Afzal et al.’s (2023) concept of the multifaceted digital divide. This finding challenges narratives of AI as an inherently democratizing force (Prinsloo & Khalil, 2024) by showing how it can initially exacerbate cognitive inequities. These regional inequities make China a critical case study in how generative AI can, without targeted intervention, perpetuate and even amplify existing educational disparities.
However, the crucial finding that structured pedagogical scaffolding—such as providing prompt engineering templates and step-by-step technical guides—significantly narrowed this gap offers a powerful blueprint for equitable implementation. This demonstrates that access alone is insufficient; meaningful access requires developing specific AI literacy skills (Walter, 2024). The convergence of performance trajectories by Week 12 suggests that targeted interventions can effectively build the necessary cultural capital for students from underserved backgrounds to leverage AI effectively. Nevertheless, residual inequities persisted, manifested in the disproportionate time rural students spent troubleshooting technical issues and the disadvantage faced by those with weaker English proficiency when navigating the AI’s Western-centric biases.
The latter point is critical: DeepSeek’s training data, dominated by English and Western perspectives (Singh, 2024), introduced culturally inappropriate suggestions that were harder for less proficient students to detect. This highlights a fundamental tension—while AI can generate Chinese translations, its underlying logic and biases often reflect its Western training origins (Yang & Wang, 2023). Thus, achieving true equity requires not only bridging the digital access and skills gap but also actively developing students’ critical AI literacy to identify, challenge, and compensate for these biases, potentially through cross-referencing with local resources as students began doing spontaneously.
Heightened Motivation Juxtaposed with Critical Ambivalence
The quantitative and qualitative data consistently pointed towards significantly heightened learner engagement within the AI-enhanced DDL environment. Increased motivation, self-efficacy, and autonomy reported by the experimental group align with findings on AI’s potential to foster learner-centered exploration (Mizumoto, 2023; Zhou & Hou, 2024). Students perceived DeepSeek as an “always-available tutor,” enabling iterative practice, experimentation with complex tasks earlier in their training, and fostering a sense of ownership over their learning (Lewis, 2014)—core tenets of DDL. This shift toward greater autonomy represents a potential disruption to traditional teacher-centered Confucian-heritage pedagogies (Tan, 2015), aligning with broader “AI + Education” goals in China (Knox, 2020).
However, this enthusiasm was tempered by a significant undercurrent of critical ambivalence. Learners readily acknowledged the efficiency gains but expressed profound anxieties about over-reliance potentially eroding their critical thinking and strategic decision making skills—competencies central to professional translatorial competence (PACTE Group, 2003). This mirrors global concerns about AI deskilling learners (Zábojník & Hromada, 2024; Zhai et al., 2024).
Furthermore, engagement was actively shaped by encounters with the AI’s cultural limitations. Frustration with Western-centric outputs, such as politically loaded translations or culturally insensitive metaphors, prompted students to develop compensatory critical evaluation strategies. This active negotiation, where learners neither blindly accepted nor wholly rejected AI suggestions but engaged in critical scrutiny, represents a sophisticated form of engagement crucial for future translators operating in an AI-mediated landscape. It underscores that engagement with AI in education must encompass critical awareness and evaluative judgment (Bearman et al., 2024), not just enthusiastic usage.
Toward Pedagogically Scaffolded and Equitable Integration
This study underscores that generative AI is not merely a tool but a transformative agent in translator education, demanding careful pedagogical orchestration. Its power lies in generating dynamic, personalized learning resources that enhance instrumental and strategic competences (Talgatov et al., 2024), particularly in specialized domains, while fostering learner autonomy. However, this transformation is not automatic or universally beneficial. Its success hinges on proactively addressing the equity gaps it can initially exacerbate and fostering the critical literacy needed to navigate its inherent biases. The findings advocate for a pedagogically scaffolded integration approach:
In a nutshell, generative AI like DeepSeek offers a compelling pathway to revitalize DDL within Chinese translator education, aligning it with the demands of the digital age and national “AI + Education” initiatives. It demonstrably enhances specific competencies and engagement when implemented thoughtfully. However, its integration must be guided by a commitment to pedagogical equity and critical awareness.
Conclusion
This study demonstrates that the question of AI’s efficacy in education is inseparable from the question of equity. Our findings show that generative AI’s potential is highly conditional, and its primary condition is the presence of pedagogical scaffolding designed to mitigate pre-existing socioeconomic disparities, hinging on pedagogical and institutional support to navigate its promises and pitfalls. While AI-enhanced DDL facilitated measurable gains in terminology accuracy and cultural adaptation—aligning with the development of instrumental and strategic sub-competences—its benefits were not uniformly distributed. A significant rural-urban divide initially mediated outcomes, underscoring that without intervention, technology can exacerbate existing inequities.
Our primary contribution is a dual theoretical extension. First, we empirically apply CLT to AI-mediated learning, revealing the digital divide not just as an access issue but fundamentally as a disparity in managing extraneous cognitive load. This extends CLT by demonstrating that a tool’s cognitive impact is not intrinsic but is mediated by user familiarity, positioning digital literacy as a key factor in instructional design for complex cognitive tasks. Second, we advance the PACTE model by testing it in an AI-augmented environment, showing its sub-competences are developable but contingent on scaffolded pedagogy that teaches critical tool use and counters AI’s cultural biases. This contributes to the model by explicitly integrating ‘critical AI literacy’ as a core component of modern instrumental and strategic sub-competence, essential for translators in an AI-saturated profession. Thus, efficacy is not inherent to the technology but is a function of equitable instructional design. While this study was conducted in China, the mediating role of the rural-urban divide and the critical importance of pedagogical scaffolding are likely to be relevant in any context where socioeconomic and digital inequities exist. China’s experience offers a framework for understanding and addressing these universal challenges in technology integration.
This study has several limitations that caution against overgeneralization. First, the single-institution sample limits generalizability to eastern Chinese universities; future research should include central-western institutions to account for regional disparities in digital infrastructure. This geographic limitation means the observed power of our pedagogical scaffolding must be tested in regions with potentially more severe infrastructural constraints, where its efficacy might be diminished. Second, DeepSeek’s Western-centric training data may not represent other AI tools (e.g., ChatGPT or ERNIE); replication with diverse, and especially multilingual, models is needed. Consequently, our findings regarding cultural bias and the need for critical evaluation may be more pronounced when using Western-trained models compared to those trained on more balanced or Sinocentric datasets. Finally, the focus on written translation in technical and legal domains leaves its efficacy in literary or interpreting specializations an open question. The observed gains in cultural adaptation may not translate directly to these more creative and real-time domains, where AI’s current limitations are more acute.
These limitations define clear pathways for future work. Crucially, longitudinal studies (1+ years) could explore AI’s impact on professional translation outcomes, such as employer ratings of graduates’ competence, moving beyond academic metrics to assess real-world efficacy.
Ultimately, this research argues for a critically optimistic approach: generative AI can revitalize translator education, but only if its integration is guided by a steadfast commitment to scalable equity, positioning pedagogical scaffolding as the non-negotiable prerequisite for ethical and effective implementation.
