Abstract
Google Translate (GT) has become a popular machine translation (MT) tool among language learners, received by instructors with excitement over its pedagogical potential and concerns about its possible misuse in the classroom, particularly when this misuse goes undetected. This study investigated the suitability of natural language processing (NLP) software for the automated detection of MT use in second language (L2) writing, examining a dataset composed of written samples generated by GT and direct L2 writing produced by intermediate-level postsecondary learners of Spanish. NLP-powered analyses found significant lexical and sentential-level differences, as well as estimated proficiency-level differences across text types. Automated judgments based on lexical diversity and amount of coordination yielded detection accuracy rates of 73.08% each, whereas proficiency estimates informed correct automated judgments with an overall accuracy rate of 86.54%. An automated reverse-translation protocol using probability estimates was capable of differentiating between direct L2 writing and MT-assisted texts 98% of the time, far surpassing human detection rates (73%) found in a previous study for the same dataset. These findings argue strongly for the potential of NLP-driven textual analysis as a reliable tool to assist instructors in detecting unauthorized uses of MT in L2 writing.
Keywords
Get full access to this article
View all access options for this article.
