Sage Journals: Discover world-class research

Abstract

Despite a large number of studies on the adoption of automated writing evaluation (AWE) systems, the effects of automated written corrective feedback (AWCF) on English as a Foreign Language (EFL) students’ writing has been insufficiently documented. This study employed a mixed-method approach to examine such effects because of the significance of AWCF in EFL writing. Using a quasi-experimental design, this study explored how AWCF through Grammarly affected EFL students’ writing quality. A total of 67 EFL students from two intact university English classes participated in this study, with a treatment group receiving two rounds of Grammarly feedback and teacher feedback while a comparison group receiving teacher feedback only. The results of the posttest writing task revealed that the students from the treatment group did not significantly outperform the students from the comparison group in syntactic and lexical complexity, accuracy, and fluency. A follow-up questionnaire consisting of fixed-response and open-ended questions was administered to the students from the treatment group after the posttest to elicit the students’ perceptions of Grammarly feedback effects on their writing. The qualitative findings supported and provided deeper insights into the quantitative results. This study was concluded with a discussion of its limitations and implications.

Keywords

EFL writing AWCF mixed-methods approach students’ perceptions

Introduction

Among different language learning skills, writing skill occupies a crucial part in various levels of learning and tests (J. Zhang, 2019). Students’ writing ability can accurately reflect their language proficiency level (Jin & Yang, 2006). However, in the meantime, most English as a Second Language (ESL) students stated that “expressing ideas in correct English” (Evans & Green, 2007, p. 8) would be the biggest obstacle for their English learning. Under this background, Bitchener and Ferris (2012) have contended that written corrective feedback (WCF) plays a key role in second language (L2) writing because it might serve as a useful tool for L2 learners to improve their writing performance. Because of the importance of English writing skill and the difficulties facing English learners, it could be meaningful to continue to examine the potential effects of WCF on students’ writing quality.

As researchers continue to investigate the effects of various types of feedback on student writing, feedback sources, such as automated written corrective feedback (AWCF) offered by automated writing evaluation (AWE) systems, also gain much attention from researchers. The availability of large corpus of students’ writing samples and the development of natural language processing have allowed AWE systems to provide AWCF on student writing. Along with the line of research on the use of AWCF in student writing, researchers have also noted some merits and drawbacks of the feedback. For example, AWCF can help students improve language-related issues in their writing (J. Li et al., 2015), and AWE system can also provide immediate feedback for student writing (Fang, 2010). The immediate feedback and the scoring feature of AWE systems might serve as an incentive for student revision (P. L. Wang, 2013) because it is likely that students might revise their writing multiple times to gain a satisfying score. In fact, one advantage of AWE systems is that students can revise their drafts as many times as they want after submission (Warschauer & Ware, 2006), which might in turn, help to enhance students’ writing quality through cultivating their autonomy in the process of writing, assessing, and revising (Chen & Cheng, 2008).

However, despite the advantages of AWCF, researchers also pointed out several disadvantages. For example, Aluthman (2016) claimed that sometimes AWCF could be too complex for ESL learners at lower proficiency level to follow. In addition to the feedback complexity, Lai (2010) contended that some AWE systems could provide formulaic or repetitive feedback, which could result in student confusion. In such a case, it would be hard for students to demand for clarification because AWCF might not consider the social aspect of feedback (M. J. Wang & Goodman, 2012). Moreover, Cheville (2004) noted that AWCF might only draw students’ attention to surface levels of their writing (e.g., grammar and mechanics), neglecting some deeper levels, such as content or organization. This type of feedback could not lead to the overall development of student writing, and the types of essays on which AWCF could be provided are limited (Ware, 2011). Because of these potential advantages and disadvantages concerning AWCF, it seems necessary for researchers to take into account student perceptions of the feedback when investigating its effectiveness. One reason for this necessity is that considering student perceptions could allow L2 writing practitioners to stay informed of what their students regard as advantages and disadvantages of AWCF, and subsequently to maintain the advantages and find ways to deal with the disadvantages. Through this way, practitioners might be able to help students benefit most from AWE feedback.

Many studies have shown the effectiveness of AWCF on the improvement of students’ writing quality (e.g., Dikli & Bleyle, 2014; El Ebyary & Windeatt, 2010; J. Li et al., 2015). For example, J. Li et al. (2015)’s study revealed that Criterion feedback could lead to the improvement of student writing both from one draft to the next, and from the first to the final. Moreover, research has shown that AWCF could be effective for students at differential proficiency levels. For example, Kim’s (2014) study demonstrated that AWCF helped both the high- and low-level students to significantly improve their writing quality from the first draft to the revised draft. Despite the effectiveness of AWCF on student writing, there has been a lack of research on how a combination of AWCF with teacher WCF can impact students’ writing quality, particularly to Chinese English as a Foreign Language (EFL) students. The investigation of the effects of the combination is necessary as such a combination could be more in line with the ecological validity of L2 writing classroom (Stevenson & Phakiti, 2014), and AWCF should be employed as a complement to teacher WCF, not a replacement (Ware, 2011).

However, to the best of our knowledge, little research has been conducted to examine how the combination affects student writing in comparison to teacher WCF only. One of the few exceptions is Wilson and Czik’s (2016) study, in which the authors investigated whether there was any difference in students’ writing quality between those who received AWCF with teacher WCF and those who received teacher WCF only. The results showed that there was no significant difference between these two feedback conditions in students’ writing quality of final drafts. In terms of writing motivation, students who received both AWCF and teacher WCF exhibited greater writing persistence than their counterparts who received teacher WCF only. Despite these pedagogically informed findings, one limitation of the study was that it did not take into consideration students’ perceptions of the AWE feedback they had received. Taken together, the purposes of the present study are to examine the effects of AWCF on students’ writing quality, how students perceive it, and the relationship between their perceptions and writing quality. Writing quality in this study was operationalized as complexity (i.e., syntactic and lexical complexity), accuracy, and fluency (CAF). It is worthwhile to conduct the present study because of the unprecedentedly increasing application of AWE systems to L2 writing, and the crucial role of L2 writing in the overall development of students’ language proficiency.

The research questions of the present study are:

Is there any difference between Grammarly feedback with teacher WCF and teacher WCF only in the writing quality of Chinese EFL students with lower proficiency levels?

What is the relationship between the degree of satisfaction about Grammarly feedback and writing quality for Chinese EFL students with lower proficiency levels?

How do Chinese EFL students with lower proficiency levels perceive the Grammarly feedback?

Literature Review

AWCF and L2 Writing

Modest evidence was reported that AWCF has a positive effect on students’ writing quality, and two lines of studies can be found as to the effectiveness of AWCF on students’ writing quality (Stevenson & Phakiti, 2014). The first line of studies was within-group studies that presented the evidence on the increase of students’ writing scores and the decrease of students’ errors across multiple drafts produced by the same student writers (El Ebyary & Windeatt, 2010; Liao, 2016; A. Lu & Li, 2016; Parra & Calero, 2019; Thi & Nikolov, 2022). For example, Liao’s (2016) study showed that AWCF was effective in helping reduce students’ linguistic errors for both revisions and new pieces of writing. The reasons for the effectiveness of AWCF might be that AWE tools are able to provide immediate feedback on student writing (Dikli, 2006), develop student awareness of writing process (Matsumoto & Akahori, 2008), and cultivate students’ writing autonomy (Y. J. Wang et al., 2013). However, despite these potential merits of AWCF, researchers have noted some of its drawbacks, such as its excessive focus on linguistic issues in writing (Warschauer & Ware, 2008) and its failure to reflect social, multimodal, and contextual aspects of writing (Vojak et al., 2011). Therefore, to mitigate these drawbacks of AWCF, further research is needed on combining AWCF with other sources of feedback, such as teacher feedback, and examining how these combinations can affect students’ writing quality.

The second line of studies were between-group studies that investigated either the effects of AWCF versus no feedback or the effects of AWCF versus teacher WCF on students’ writing quality. First, studies that compared the effects of AWCF with no feedback have reported the effectiveness of AWCF (e.g., Barrot, 2023; Franzke et al., 2005; Grimes, 2008). Franzke et al. (2005) examined the effect of Summary Street feedback on students’ writing quality, and they found that students receiving the feedback performed significantly better than students receiving no feedback in text quality, content, organization, and stylistic quality. Second, when it comes to comparing AWCF with teacher WCF, studies also have revealed the superiority of the former over the latter (e.g., S. Wang & Li, 2019; Y. J. Wang et al., 2013; Warden, 2000). For example, S. Wang and Li (2019) examined the effects of Writing Roadmap (WRM), one type of AWE system, on student writing. The results of the study indicated that students receiving WRM feedback outperformed students receiving teacher WCF in the aspects of language form, contextual structure, and writing quality when WRM was used to assess student essay. Moreover, when student essay was assessed by the teacher, the WRM feedback helped students produce significantly better essays in writing quality than the teacher WCF did.

Although these between-group studies demonstrated the effectiveness of AWCF on students’ writing quality, several design deficiencies should also be noted. Specifically, Stevenson and Phakiti (2014) contended that studies comparing the effects of AWCF with teacher WCF did not clarify how teacher feedback was provided, which made it hard to attribute any observed effects to the provision of one type of feedback. Moreover, they further claimed that designing this type of between-group studies should integrate AWCF with classroom setting because AWCF should be used to complement teacher WCF instead of replacing it. In other words, it seemed that more research is needed to explore the effects of the combination of AWCF with teacher WCF on students’ writing quality, and whether this combination is more useful than teacher WCF alone. This type of research is needed because, as the prevalence of the use of AWE system in the current digital era, L2 writing teachers may want to stay informed of how effective AWCF is and what role it should play in L2 writing instruction. This is particularly necessary for EFL writing domain where many students are writing their essays through AWE systems. Based on the empirical studies discussed so far, it seems that AWCF may have a positive impact on students’ writing quality under certain conditions. However, the empirical studies can provide a general picture as to the potential effectiveness of AWCF at the cost of offering insights into how students actually perceive of the feedback or how the feedback leads to the improvement of their writing quality. For example, although students who receive AWCF could outperform their counterparts who do not receive such feedback, it might be likely that some students might benefit much more from the feedback than others (see D. R. Ferris, 2006). Multiple factors resulting from individual differences may contribute to this type of likelihood. Among the factors, students’ perception of feedback is the one that needs to be explored because perception could mediate the extent to which students use feedback to improve their writing (Wilson & Czik, 2016). Therefore, to obtain a deeper understanding about the effectiveness of AWCF, researchers may want to take into consideration student perception regarding the feedback.

Student Perception of AWCF

Studies about the effect of AWCF on student writing should not only deal with how effectively an AWE system works, but also should attach importance to how a student perceives or internalizes the feedback (Jiang & Yu, 2022). Although studies about student perception of teacher WCF have indicated that students in general have a positive attitude toward the feedback (Sinha & Nassaji, 2022), studies about student perception of AWCF have observed mixed findings because students usually hold both positive and negative attitudes toward the feedback. Specifically, students in some studies stated that AWCF was helpful for their writing (e.g., Bai & Hu, 2017; S. Huang & Renandya, 2020; Z. Li et al., 2014). For example, many students in Bai and Hu’s study made it clear that Pigai feedback, which is one type of AWCF widely used by Chinese EFL learners, was able to enhance the mechanics and grammar of their writing. In contrast, there are also some studies that have demonstrated students’ negative attitudes toward AWCF (e.g., Chen & Cheng, 2008; Cheng, 2017; Lai, 2010). Specifically, because little social learning was involved in the provision of AWCF compared with other sources of feedback, such as peer or teacher feedback, students might experience “dehumanizing instruction” (Lai, 2010, p. 442), which might cause “frustration to students and limited their learning of writing” (Chen & Cheng, 2008, p. 94). In addition to the social learning aspect, students also reported that AWCF could be confusing and could fail to provide specific and meaningful information as to their written errors (Lai, 2010). Employing questionnaire and focused group interviews, Cheng (2017) investigated how students in his study perceive of the AWCF they received. The students in that study claimed that the AWE system might not accurately assess their writing and consider their feelings when providing feedback, and low scores given by the AWE system could demotivate them to write (Cheng, 2017). In a similar vein, Scharber et al. (2008) revealed a more complex picture about how students perceive of AWCF, in which students might change their positive attitudes toward the feedback to negative attitudes. Specifically, students might first feel engaged in processing the feedback because they wanted to improve their writing scores given by AWE systems through applying the feedback to revisions. However, when they encountered the inaccuracy of the AWCF, they became frustrated with it and decreased the use of it. It seemed that students’“subjective experience” (Scharber et al., 2008, p. 27) with AWCF played an indispensable role in whether they might continue to use the feedback and how much they might benefit from it. Taken together, because of the mixed findings about students’ perceptions of AWCF and the importance of the perceptions on student use of the feedback, a study on the effectiveness of AWCF might want to include an investigation of student perception concerning the feedback, and the relationship between the perception and students’ writing quality.

Methodology

Research Design

This study adopted an embedded mixed-method design (Creswell & Creswell, 2018) to examine the impact of AWCF on university EFL students’ writing quality and explore students’ perceptions of the feedback. Both quantitative and qualitative questions were answered by this embedded design. According to Creswell and Creswell, the qualitative data in this study were collected after the quantitative data, so it was also an “explanatory sequential core design” (p. 381). In other words, the embedded design used in this study was to further explore why certain results were generated, to help elaborate on differences in outcome measures, and to elicit participant views on the AWCF so that possible changes might be made for students to benefit most from it.

For the first research question, the independent variables were the feedback conditions. Specifically, students in the experimental group received teacher WCF and AWCF, while students in the control group received teacher WCF only. The dependent variable was students’ writing quality in the post-test writing task. For the second research question, eight variables associated with CAF measures and students’ perceptions of AWE feedback were used to address the relationship between EFL students’ degree of satisfaction about AWCF and writing quality. Quantitative and qualitative methods were used to analyze students’ post-test writing task and a questionnaire. The qualitative data in this study were used to triangulate the quantitative data in order to explore further insights into the potential effects of AWCF on student’ writing quality. In terms of ensuring the inter-rater reliability of calculating error ratio, 15% of the students’ essays were also scored by the other EFL writing instructor in addition to the researcher of the present study. The Pearson Correlation coefficient between the two sets of scores was .87. Then the remaining student essays were randomly distributed between the two scorers.

Context and Participants

This study was conducted in an institute located at southwest China. The participants in this study were non-English majors with lower level of English proficiency. They were sophomores with an average age of 19, including 32 boys and 35 girls. During the study, they were enrolled in a compulsory College English course for all non-English majors in the institute. The course lasted the whole semester of 16 weeks, and the student participants met for 1.5-hr sessions twice a week. An instructor who has been teaching the course for 19 years and holds a Ph.D degree in applied linguistics taught both classes based on similar teaching materials. The participants have been learning English for about 9 years, and none of them had any experience in living or studying abroad. Their majors were electronic business and food science and engineering, and their main goals of studying English were to complete their undergraduate study or to pass College English Tests-Band 4 (CET-4), which is a nationally standardized English test for non-English majors. CET-4 scores have been used by researchers to gauge EFL students’ English proficiency because of the well-documented validity and reliability of the CET-4 (e.g., Gao & Min, 2021; S. Huang & Renandya, 2020). As such, the CET-4 scores of the students in the present study were collected and examined, and the results showed that their scores were all below 425 (roughly 50 in TOEFL iBT). In addition, both the classroom observation and the scores of their English final exam in the previous semester revealed that the students in this study were at lower level of English proficiency. All the students had no previous experience in using Grammarly feedback to revise essays.

Research Procedure

In this study, a sample of 67 EFL students were divided into an experimental group and a control group, with the first group including 30 students, and the second group including 37 students. The experimental group received WCF from both the teacher and Grammarly while the control group received WCF only from the teacher. The principal investigator in this study also served as the teacher who taught both groups of students to ensure the students received the same instruction on English learning. During a span of 12 weeks, the students in both groups were asked to complete four writing tasks (i.e., a pre-test writing task, writing tasks 1 and 2, and a post-test writing task) and two revision tasks (i.e., revisions of writing tasks 1 and 2) in total.

At week 1, the principal investigator, who is also the teacher of the two classes, provided a 40-min training session for students in the experimental group who received AWCF from Grammarly. The session was for students to be familiar with how Grammarly provided feedback for their essays. Specifically, the session was conducted in a language lab where each student had access to a computer. The teacher-researcher used three students’ essays as examples to explain to the students about the use of Grammarly feedback. Students in the control group were not given the training session since they did not receive Grammarly feedback. After the training session, students in both groups were asked to complete the pre-test writing task to examine whether they started the experiment at a similar level of writing quality. The treatment of this study was from week 3 to week 9. Specifically, the teacher assigned writing task 1 to the students in both groups at week 3. All students completed the task on paper in class. Then the students in the experimental group submitted their writings to the teacher for higher-level WCF on content and organization before they submitted to Grammarly for lower-level WCF on spelling, grammar, and punctuation. In contrast, the students in the control group only submitted their writings to the teacher for both higher- and lower-level WCF. The lower-level WCF provided by the teacher was comprehensive and direct because of the potential advantages of direct WCF (see L. J. Zhang & Cheng, 2021) and the ecological validity of comprehensive WCF in writing class (Q. Liu & Brown, 2015). The other reason for using direct WCF was that both direct WCF and Grammarly feedback are explicit, which helps to validate the results concerning the two types of feedback. Moreover, according to Lee (2008) study, lower proficiency students generally preferred explicit feedback, such as direct WCF. All students revised their writings based on the given WCF at week 5. This procedure of completing writing task 1 was repeated in writing task 2, which was assigned at week 7 and revised at week 9. At week 11, all students were asked to produce their post-test writings that were typed into computers by a research assistant and stored as Word files for later analyses. To check the accuracy of the research assistant’s typing, the researcher went through all the typed writings, and addressed any typing issues. Then a questionnaire was administered to the students in the experimental group at week 12. (see Table 1).

Table 1.

The Procedure Used in this Study.

	Experimental group	Control group
Week 1	Training on Grammarly	Normal class instruction
Week 1	Pre-test writing task
Week 3	Writing task 1
Week 5	Teacher WCF + Grammarly feedbackRevision of writing task 1	Teacher WCFRevision of writing task 1
Week 7	Writing task 2
Week 9	Teacher WCF + Grammarly feedbackRevision of writing task 2	Teacher WCFRevision of writing task 2
Week 11	Post-test writing task
Week 12	Administration of questionnaire

Instruments

Writing Tasks

Students in this study were required to complete four writing tasks in total: a pre-test writing task, writing tasks 1 and 2, and a post-test writing task (see Appendix A). For each of the four writing tasks, students were required to write 120 to 180 words in 30 min. The directions and the topics of the writing tasks were adapted from the writing tasks of the past CET-4 tests to ensure the tasks’ validity and reliability (Teng & Zhang, 2020). The genre of all the four tasks were argumentative essays. According to Y. Huang and Jun Zhang (2020), argumentative essays are widely used to assess students’ writing proficiency in large-scale English proficiency tests in China, and CET-4 test is one of such tests for Chinese non-English majors. In fact, non-English majors usually spend a large amount of time practicing argumentative essays to get a good grade in CET-4 because of an exam-driven educational system in China (L. Zhang, 2016). In addition, an argumentative essay was chosen as the writing task because research has shown that such a task might elicit relatively long or complex sentences from students and that students might tend to generate more language-related issues when they produce such sentences (Q. D. Liu, 2016). In a similar vein, relatively complex structure of an argumentative essay may pose a challenge to language learners (Connor, 1990; Schiffrin, 1985). Therefore, this study asked students to produce argumentative essays under the assumption that providing more opportunities for students to practice may be helpful for them to improve their writing skills in this type of essays.

Questionnaire

A questionnaire was administered to collect student perceptions about the AWE system. The questionnaire consisted of two parts, with the first part including 10 five-point Likert scale questions (see Table 2), and the second part including two open-ended questions: 1. What do you like most about Grammarly? Why? 2. What do you like least about Grammarly? Why? The first part was adapted from S. Huang and Renandya’s (2020) study, and the second part was adapted from Cheng’s (2017) study. The reasons for adapting Huang and Renandya’s questionnaire were two-fold. First, their questionnaire was administered to EFL students who might have the similar profile to the students of this study. Second, the design of their questionnaire was based on several AWCF studies, and the questionnaire was appropriately implemented in their study to elicit EFL students’ perception about AWCF. The 10 Likert scale questions were composed of three constructs: perceived comprehensibility (three items), perceived usefulness of the feedback for revision (three items), and perceived usefulness of the feedback for English writing performance (four items). The internal consistency of the question items was above the standardized benchmark: calculated as Cronbach Alpha, the reliability coefficients of the first construct was .75, of the second construct was .81, and of the third construct was .72. The Likert scale questions were translated to Chinese and were measured through 5 (strongly agree) to 1 (strongly disagree). For the second part, students were allowed to answer the two open-ended questions using English, Chinese, or a mixture of them to better gather their perceptions about Grammarly feedback. Then the students’ Chinese answers were translated into English. Based on Strauss and Corbin’s (1998) study, the two open-ended questions were analyzed through open coding and axial coding. For open coding, different concepts were identified, coded, and summarized after the students’ answers to the questions were read multiple times. Then for axial coding, the concepts were analyzed and categorized as different themes.

Table 2.

Likert-scale Questions Used in This Study.

I can understand feedback by Grammarly.	1	2	3	4	5
I know how to revise the composition based on feedback I receive from Grammarly.	1	2	3	4	5
I think the feedback by Grammarly is clear.	1	2	3	4	5
The feedback can help me correct grammar mistakes in this composition.	1	2	3	4	5
It can help me get higher score for this composition.	1	2	3	4	5
I think it can help me improve the quality of this composition.	1	2	3	4	5
The feedback can help me realize my writing problems.	1	2	3	4	5
It can help me improve my grammar.	1	2	3	4	5
It can help me enlarge my vocabulary.	1	2	3	4	5
I think it can help me enhance my writing performance.	1	2	3	4	5

In order to ensure the accuracy of the translation, a professor teaching translation translated the Chinese version questionnaire back to the English version. A high accuracy was found after closely examining these two versions. Then five EFL non-English majors were asked to complete a pilot test to the questionnaire to check the face validity, after which the wordings of several items were adjusted. For the final Chinese version of the questionnaire, a satisfactory reliability was generated with the three constructs producing Cronbach’s alpha coefficients ranging from .80 to .89.

CAF Measures

A variety of measures were used to investigate the students’ English writing quality because of the multi-componential nature of CAF constructs (Housen et al., 2012). Specifically, in line with Norris and Ortega (2009), syntactic complexity was measured through four indices: 1. the mean length of T-units (MLT), 2. dependent clause per T-unit (DC/T), 3. the coordinate phrases per clause (CP/C), and 4. complex nominal per clause (CN/C). These four indices were selected due to the characteristic of multiple components of syntactic complexity, and the necessity of incorporating measures of subordination, coordination, and phrasal complexity when assessing syntactic complexity (Housen et al., 2012; Johnson, 2017). L2 Syntactic Complexity Analyzer (L2SCA) was used to analyze the four indices in syntactic complexity (X. Lu, 2010).

Following Link et al. (2022) and Vasylets and Marín (2021), lexical complexity was assessed from the dimensions of lexical diversity and sophistication, and was computed with Coh-Metrix 3. This study employed the metric of textual lexical diversity (MTLD) to address the former and the metric of log frequency of content words (LCW) to address the latter. MTLD was chosen because it is a valid measure of L2 proficiency (Yoon & Polio, 2017), and was least affected by essay length (Mazgutova & Kormos, 2015; McCarthy & Jarvis, 2010). LCW indicates the average word frequency for the log of content words in the CELEX database (McNamara et al., 2014). The reason for choosing LCW was that it is more reliable than the raw frequency of content words pertaining to the indication of lexical sophistication (Kormos, 2011).

In line with a number of studies (e.g., Chandler, 2003; Karim & Nassaji, 2020), this study employed an error ratio to examine students’ writing accuracy. An error ratio is calculated by all errors in an essay divided by the total number of words, and multiply 100. Multiple types of errors were concerned in this study, including grammar, vocabulary, spelling, and punctuation. One advantage of an error ratio is that it takes into account the differences of essay length. Finally, fluency was assessed by the total number of words composed by the students within the 30-min time limit.

Treatment

This study used Grammarly to provide WCF for students in the experimental group. Powered by artificial intelligence (AI), Grammarly provides free help in spelling, grammar, and punctuation for students’ writings. Although Grammarly can be accessed through MS or as an app on smartphones, this study was conducted through the website of Grammarly (https://app.grammarly.com). This study used free online version of Grammarly rather than premium version. The free version has several features that should be noted. First, it marks students’ writings with scores ranging from 1 to 100 based on writing quality. Second, students can set their writing goals in terms of audience and formality. Third, a variety of lines differing in color are used to underline different flaws in student writing, with red lines representing correctness, blue lines representing clarity, green lines representing engagement, and purple lines representing delivery. Fourth, Grammarly provides explicit WCF in terms of correctness. This type of explicit WCF might make it easier for students to correct errors by themselves. It is also worth noting that the explicit WCF is highly accurate when correcting found errors (Paul & Woll, 2020), particularly for error types commonly made by EFL students, such as error types about determiner, preposition, and spelling (Ranalli & Yamashita, in press).

Data Analysis

SPSS Version 25 was employed to perform the statistical analyses in this study. I utilized an independent sample t-test to answer the first research question. Specifically, an independent sample t-test was performed to examine whether there was a difference in writing quality between the two groups in the post-test writing task. Pearson product-moment correlation analyses were used to answer the second research question, which explored the relationship between EFL students’ degree of satisfaction about AWCF and writing quality. The degree of satisfaction was from students’ answers to the ten Likert-scale questions, which was the first part of the questionnaire. The qualitative analyses of the two open-ended questions in the second part of the questionnaire were conducted to answer the third research question, with the categorization of the data and the identification of their emerging themes (L. J. Zhang & Cheng, 2021).

Results

Compare Writing Quality Between the Experimental Group and the Control Group

To ensure the comparability of the two groups’ writing quality at the beginning of the intervention, an independent samples t-test was conducted with regard to the students’ pre-test writing task. The results indicated that there was no significant difference between the experimental group (M = 71.700, SD = 11.55) and control group (M = 73.676, SD = 7.83) in writing quality (t(65) = −0.832, p > .05). Then a series of independent samples t-tests were computed to account for the first research question, which asked whether Grammarly feedback and teacher WCF could lead to better writing quality than teacher WCF only. Tables 3 and 4 showed descriptive statistics and Levene’s test results of the independent samples t-tests, respectively. Levene’s test results revealed that the equal variances assumption held between the two groups for the variables of MLT, DC/T, and CP/C in the syntactic complexity measure, the variables of MTLD and LCM in the lexical complexity measure, and both the variables of error rates and word count in the measures of accuracy and fluency. However, the equal variances assumption was violated between the two groups for the variable of CN/C in the syntactic complexity measure. To examine whether there were significant differences in the eight variables between the two groups, independent samples t-tests were conducted for the variables of MLT, DC/T, CP/C, MTLD, LCM, error rates, and word count, while Mann-Whitney U test was applied for the variable of CN/C. The results were shown in Table 5. For this study, both the independent samples t-tests and Mann-Whitney U test used an adjusted p value of .006 (.05/8) to address the all eight variables.

Table 3.

Descriptive Statistics of the CAF Measures From the Experimental and Control Groups.

Measures	Variables	Experimental group (N = 30)	Control group (N = 37)
Syntactic complexity	MLT (Mean, S.D.)	13.53 (3.96)	12.61 (3.13)
	DC/T (Mean, S.D.)	0.553 (0.472)	0.581 (0.385)
	CP/C (Mean, S.D.)	0.232 (0.174)	0.276 (0.229)
	CN/C (Mean, S.D.)	1.020 (0.431)	0.790 (0.266)
Lexical complexity	MTLD (Mean, S.D.)	59.859 (15.446)	66.301 (21.154)
Lexical complexity	LCM (Mean, S.D.)	1.245 (0.449)	1.292 (0.350)
Accuracy	Error rates	5.935 (3.685)	7.105 (3.934)
Fluency	Word count	131.833 (28.446)	132.162 (20.911)

Table 4.

Levene’s Test Results of the CAF Measures Between the Two Groups.

Measures	Variables	F	p	Equal variances assumed (p > .05)
Syntactic complexity	MLT	0.419	.520	Yes
	DC/T	1.148	.288	Yes
	CP/C	1.170	.283	Yes
	CN/C	5.044	.028	No
Lexical complexity	MTLD	1.986	.164	Yes
Lexical complexity	LCM	2.528	.117	Yes
Accuracy	Error rates	1.139	.290	Yes
Fluency	Word count	1.014	.318	Yes

Table 5.

Independent Samples t-Test Results of the CAF Measures Between the Two Groups.

Measures	Variables	t	df	p
Syntactic complexity	MLT	−1.065	65	.291
	DC/T	0.263	65	.793
	CP/C	0.854	65	.396
Lexical complexity	MTLD	1.393	65	.168
Lexical complexity	LCM	0.487	65	.628
Accuracy	Error rates	1.245	65	.217
Fluency	Word count	0.054	65	.957

For the syntactic complexity measure, the results indicated that there were no significant differences between the experimental group (M_MLT = 13.53, SD = 3.96; M_DC/T = 0.553, SD = 0.472; M_CP/C = 0.232, SD = 0.174; M_CN/C = 1.020, SD = 0.431) and control group (M_MLT = 12.61, SD = 3.13; M_DC/T = 0.581, SD = 0.385; M_CP/C = 0.276, SD = 0.229; M_CN/C = 0.790, SD = 0.266) in the variables of MLT (t (65) = −1.065, p > .006); DC/T (t (65) = 0.263, p > .006; CP/C (t (65) = 0.854, p > .006; CN/C (U = 750.500, p > .006). For the lexical complexity measure, the results showed that there were no significant differences between the experimental group (M_MTLD = 59.859, SD = 15.446; M_LCM = 1.245, SD = 0.449) and control group (M_MTLD = 66.301, SD = 21.154; M_LCM = 1.292, SD = 0.350) in the variables of MTLD (t (65) = 1.393, p > .006) and LCM (t (65) = 0.487, p > .006). For both the accuracy and fluency measures, the results revealed that there were no significant differences between the experimental group (M_{error rate} = 5.935, SD = 3.685; M_{word count} = 131.833, SD = 28.446) and control group (M_{error rate} = 7.105, SD = 3.934; M_{word count} = 132.162, SD = 20.911) for the measures of accuracy (t (65) = 1.245, p > .006) and fluency (t (65) = 0.054, p > .006).

Examine the Relationship Between Chinese EFL Students’ Degree of Satisfaction About AWCF and Writing Quality

A Pearson Correlation was calculated examining the relationship between the variables of CAF measures and students’ perceptions about Grammarly feedback (see Table 6). A weak correlation that was not significant was found regarding the variables of DC/T (r (28) = .208, p > .05), CP/C (r (28) = .247, p > .05), CN/C (r (28) = .158, p > .05), MTLD (r (28) = .155, p > .05), LCM (r (28) = .228, p > .05), Errors (r (28) = .099, p > .05), and fluency (r (28) = .159, p > .05). Students’ perceptions about Grammarly feedback were not related to these seven variables of CAF measures. In contrast, a moderate positive correlation was found about the variable of MLT (r (28) = .436, p < .05) under the measure of syntactic complexity, indicating a significant linear relationship between the MLT and students’ perceptions about Grammarly feedback. Students who had more positive attitudes toward Grammarly feedback tended to produce more complex MLT in their writings.

Table 6.

Summary of Correlation Analyses.

	Student perceptions	MLT	DC/T	CP/C	CN/C	MTLD	LCM	Errors	Fluency
Student perceptions
r	1	0.436^*	0.208	0.247	0.158	0.155	0.228	0.099	0.159
p		0.016	0.270	0.118	0.404	0.414	0.225	0.604	0.402

p < .05.

Investigate Students’ Perceptions About Grammarly Feedback

The third research question asked how students perceived Grammarly feedback for their writing. To answer this question, a questionnaire of fixed-response questions and open-ended questions was administered to the students who received both teacher and Grammarly feedback. Twenty-nine students submitted their responses to the questionnaire as one student did not submit his response. Their responses of the fixed-response questions were summarized, respectively, in Tables 7 to 9, while responses of the open-ended questions were summarized in Table 10. As Tables 7 to 9 show, more than half of the students strongly agreed or agreed that they could understand Grammarly feedback (58.6%) and they know how to revise based on Grammarly feedback (62.1%). The similar number of students also strongly agreed or agreed that Grammarly feedback could help them correct grammar mistakes (65.5%), get higher score for their compositions (62%), improve the quality of their compositions (65.5%), realize their writing problems (65.5%), and improve their grammar (51.7%). In contrast, less than half of the students strongly agreed and agreed that Grammarly feedback was clear (48.2%), could help them enlarge their vocabulary (37.9%), and enhance their writing performance (44.8%). The results revealed that although the majority of students noted that Grammarly feedback was beneficial to their writing, there were still quite a few students who did not hold positive attitudes toward the feedback.

Table 7.

Perceived Comprehensibility of Grammarly Feedback.

Questionnaire items	Responses	Number of respondents	Percentage of respondents
1. I can understand feedback by Grammarly.	Strongly agree	5	17.2%
	Agree	12	41.4%
	Neutral	7	24.1%
	Disagree	5	17.2%
	Strongly Disagree	0	0%
2. I know how to revise the composition based on feedback I received from Grammarly.	Strongly agree	6	20.7%
	Agree	12	41.4%
	Neutral	10	34.5%
	Disagree	1	3.4%
	Strongly Disagree	0	0%
3. I think the feedback by Grammarly is clear.	Strongly agree	5	17.2%
	Agree	9	31%
	Neutral	10	34.5%
	Disagree	5	17.2%
	Strongly Disagree	0	0%

Table 8.

Perceived Usefulness of Grammarly Feedback for Composition Revision.

Questionnaire items	Responses	Number of respondents	Percentage of respondents
4. The feedback can help me correct grammar mistakes in this composition.	Strongly agree	7	24.1%
	Agree	12	41.4%
	Neutral	7	24.1%
	Disagree	2	6.9%
	Strongly Disagree	1	3.4%
5. It can help me get higher score for this composition.	Strongly agree	5	17.2%
	Agree	13	44.8%
	Neutral	8	27.6%
	Disagree	3	10.3%
	Strongly Disagree	0	0%
6. I think it can help me improve the quality of this composition.	Strongly agree	5	17.2%
	Agree	14	48.3%
	Neutral	9	31.0%
	Disagree	1	3.4%
	Strongly Disagree	0	0%

Table 9.

Perceived Usefulness of Grammarly Feedback for Enhancing Writing Performance.

Questionnaire items	Responses	Number of respondents	Percentage of respondents
7. The feedback can help me realize my writing problems.	Strongly agree	5	17.2%
	Agree	14	48.3%
	Neutral	8	27.6%
	Disagree	2	6.9%
	Strongly Disagree	0	0%
8. It can help me improve my grammar.	Strongly agree	4	13.8%
	Agree	11	37.9%
	Neutral	11	37.9%
	Disagree	2	6.9%
	Strongly Disagree	1	3.4%
9. It can help me enlarge my vocabulary.	Strongly agree	4	13.8%
	Agree	7	24.1%
	Neutral	11	37.9%
	Disagree	7	24.1%
	Strongly Disagree	0	0%
10. I think it can help me enhance my writing performance	Strongly agree	5	17.2%
	Agree	8	27.6%
	Neutral	13	44.8%
	Disagree	2	6.9%
	Strongly Disagree	0	0%

Table 10.

Results of Student Responses to the Open-ended Questions in the Questionnaire.

Question	Category	Number of responses	Example
1. What do you like most about Grammarly?	Grammar and vocabulary feedback	11	“Grammarly can help me correct errors in grammar and recommend good vocabulary.”
	Vocabulary feedback	9	“Grammarly can correct my spelling errors, which is very helpful for me because my low English proficiency level often makes me misspell words.”
	Grammar feedback	7	“Grammar feedback is clear and is useful for me to quickly locate inappropriate expressions.”
	No response	2	—
2. What do you like least about Grammarly?	Convention and punctuation feedback	6	“I do not like punctuation feedback because I think punctuation is a minor aspect in writing.”
	Vocabulary feedback	3	“I can not understand some vocabulary provided by Grammarly because of my low English proficiency level.”
	Interface	3	“I think the operational interface of Grammarly is not user-friendly.”
	Grammar feedback	2	“Sometimes it is hard for me to find grammar errors based on Grammarly feedback.”
	Premium function	2	“Sometimes I want to improve my writing performance through more advanced Grammarly feedback, but I have to pay for it.”
	Scoring function	1	“The score given by Grammarly does not reflect my writing performance.”
	No response	12	—

Table 10 summarized student responses to the two open-ended questions in the questionnaire. Twenty-nine students answered the two questions. For the first question, 11 (38%) students claimed that they most liked grammar and vocabulary feedback offered by Grammarly, seven (24%) students most liked grammar feedback, 9 (31%) students most liked vocabulary feedback, and two (7%) gave no response. For the second question, six (20%) students stated that they least liked convention and punctuation feedback, three (10%) students least liked vocabulary feedback, three (10%) students least liked operational system of Grammarly, two (7%) students least liked grammar feedback, two (7%) students least liked the premium function of Grammarly, one (3%) student least liked the scoring function of Grammarly, and 12 (41%) students offered no response. The results suggested that while most students were in favor of certain type (s) of feedback provided by Grammarly, more than half of the students were still unsatisfied with some functions offered by Grammarly.

Discussion

The primary goal of the present study was to explore whether AWCF with teacher WCF could lead to better writing quality than teacher WCF only for EFL students with lower level of English proficiency. The findings revealed that students who received AWCF with teacher WCF might not outperform students who received teacher WCF only in writing quality. The results were in line with the results of previous research that reported the ineffectiveness of AWCF on student writing (S. Huang & Renandya, 2020; Ware, 2014; Wilson & Czik, 2016), and were contradictory with the results of previous research that reported the effectiveness of AWCF on student writing (Barrot, 2023; Thi & Nikolov, 2022).

There might be three possible explanations for the findings of this study. First, students’ lower proficiency levels could prevent them from benefiting from Grammarly feedback (Ghufron & Rosyida, 2018; Koltovskaia, 2020; Lin & Griffith, 2014; Shang, 2022). Grammarly feedback is provided in English, which is not the students’ mother tongue, so it may be hard for them to effectively process the feedback. This assumption could be backed up by some students’ comments on the feedback. For example, S22 commented that “sometimes I have to use translator to help me understand Grammarly feedback because it is in English.” S23 said that “I cannot revise my writing based on Grammarly feedback because I cannot make right revision whatever I do.” According to sociocultural theory, in order for the feedback to be helpful for students’ writing, it should be associated with their zone of proximal development (ZPD), which is defined by Vygotsky (1978) as “the distance between the actual developmental level as determined by [students’] independent problem solving and the level of potential development as determined through problem solving [with the help of more advanced external sources]” (p. 84). Referring to the present study, students’ drafts submitted to Grammarly could be seen as drafts that reflect their current linguistic knowledge, and revisions based on Grammarly feedback could be regarded as drafts that they can produce with the help of the feedback. In other words, it is this type of feedback from Grammarly that may serve as guidance to help students accomplish something that they cannot fulfill independently at that time. Grammarly feedback might be used as a tool to bridge the gap between students’ current writing level and an ideal level. However, if students fail to understand Grammarly feedback, which is a preliminary step for ZPD to work, it is unlikely that the feedback could serve as this type of bridge.

Second, students’ unfamiliarity with Grammarly feedback might be the other reason why they did not benefit from the feedback. For example, S20 commented “it is inconvenient for me to use Grammarly because I am not familiar with it.” In fact, for EFL students in China, what they use most frequently is Pigai feedback that provides feedback in students’ mother tongue. Similarly, S22 said “I only used Grammarly twice, so I do not even know what types of feedback it provides. Sometimes the feedback on formatting, such as space, overwhelms me since I do not pay much attention to space issues when writing on paper.” S3 commented “I do not like Grammarly because it does not provide keyboard for me to write on it with my smartphone. In addition, Grammarly is not made in our country, and sometimes I cannot logon it due to issues of internet connection.” In this case, previous studies have showcased the relationship between the characteristics of web-based learning system and students’ perceived ease of use (Ke et al., 2012; Nikou & Economides, 2017; Zhai & Ma, 2022), indicating that students may not want to use a type of learning system if they think it is not easy to use. Moreover, if students are unfamiliar with or are overwhelmed about AWE feedback, they may feel demotivated to use it (e.g., Sommers, 2013; Wilson et al., 2021), which, in turn, can make it hard for them to benefit from the feedback. In fact, according to social cognitive theory (Bandura, 1977, 2012), this type of demotivation may be detrimental to student learning.

Third, research has shown that students may not attend to AWE feedback if they perceive it as not being useful in their writing (R. Li et al., 2019; Zhai & Ma, 2022). In my study, the questionnaire findings revealed that the majority of students may not attend to Grammarly feedback since more than half students did not strongly agree or agree with the usefulness of Grammarly feedback in enhancing their writing performance. This lack of attention might be one reason why the feedback was not beneficial to the students’ writing. Long (1996) noted the necessity of taking into account learners’ attention when it comes to the relationship between the positive or negative evidence for language learners and their language acquisition. Referring to interaction hypothesis, Schmidt (1995, 2001) has deliberated why attention is crucial for language acquisition based on the notion of awareness, which consists of two levels: noticing (i.e., a lower level of awareness), and understanding (i.e., a higher level of awareness). In EFL writing domain, the level of noticing is for learners to be aware of any new information provided, while the level of understanding is for them to revise their language errors in writing (Bitchener, 2017). It is in the level of noticing that attention plays an indispensable role. According to Long (1996), attention is a prerequisite for noticing. In my study, the students did not pay sufficient attention to Grammarly feedback because they perceived it as not being useful. Therefore, their perception might make it less likely for them to learn from the feedback, and improve their writing quality since they could not even notice the feedback.

In addition to exploring the effect of Grammarly feedback on students’ writing, this study also examined the relationship between students’ perception of the feedback and their writing quality in eight variables of CAF measures, with no such relationship found in seven variables. This finding was partially supported by the finding of Sinha and Nassaji (2022) that no correlation was observed between students’ feedback perception and their writing accuracy, and also was partially supported by the finding of Shang (2022) that no correlation was observed between students’ feedback perception and syntactic complexity and grammatical accuracy of their writing. In contrast, this finding was contradictory with the finding of Rummel and Bitchener (2015) that claimed a connection between students’ perception and their feedback retention. The reason for the contradictory might lie in the difference of selecting participants in the two studies. Specifically, in Rummel and Bitchener (2015) study, the student participants were divided into different feedback groups based on their feedback preferences, while the current study randomly divided the student participants into different feedback groups. In this case, the students in the current study were assigned to receive Grammarly feedback not because they prefer the feedback, but because they were asked to receive it. Therefore, it might be possible that no relationship was observed between students’ perception and their writing quality in seven variables simply because they did not want to receive the feedback. In addition, the finding of the current study also demonstrated a significant correlation between students’ perception and the variable of MLT in students’ writing, indicating that the majority of students favored the idea that Grammarly feedback could prompt them to produce long sentences. Indeed, several characteristics of AWE systems, such as immediate direct feedback (Dikli, 2006) and multiple revision opportunities (Warschauer & Ware, 2006), make it more possible for students to produce long sentences after revising their writings. This finding, however, is not consistent with the finding of Shintani (2016) study indicating that corrective feedback may not lead to the better quality of student writing in syntactic features.

Conclusion, Limitations, and Implications

This study contributes to the research on the effect of AWCF (i.e., Grammarly feedback) on EFL students’ writing quality by employing a sequential explanatory mixed-methods design. The results mainly reveal that the students receiving both AWCF and teacher WCF may not outperform the students receiving only teacher WCF in writing quality. Moreover, students’ answers to the questionnaire consisting of fix-response and open-ended questions provide further insights into why the addition of AWCF to teacher WCF could not result in the significant improvement in the students’ writing quality.

The present study has several limitations. First, this study did not examine how students implemented AWCF in their revisions, so it would be hard to know how they made use of the feedback and what obstacles they encountered when revising. Future research can explore how AWCF affects amounts and types of errors in student revisions to gain a deeper understanding about the feedback, and can use think-aloud protocols to identify patterns about how students process the feedback they receive. Second, research has validated the potential role of individual differences (e.g., motivation, working memory, etc.) in student writing (see Kormos, 2012). However, this study did not trace the possible changes of such differences resulted from the provision of AWCF. Future research, for example, can investigate whether providing students with the feedback can enhance their writing motivation and consequently improve their writing quality. Third, the sample in this study was the students with lower level of English proficiency, so the finding of this study could not be generalized to students with other proficiency levels. Fourth, this study only examined the effects of AWCF on the genre of argumentative writings. Due to the importance of considering the effect of writing genre and tasks (e.g., Graham et al., 2016; Schoonen, 2012), future studies could be conducted to explore the effect of AWCF across different writing genres and tasks.

Despite the limitations, the present study could offer several implications for EFL writing teachers. First, the integration of AWCF into EFL writing instruction does not necessarily lead to the improvement of students’ writing quality. Thus, teachers should be cautious about introducing AWE systems to EFL students, particularly to students with lower proficiency levels (Xu & Zhang, 2022). If they do want to apply AWCF to their writing instruction, they may need to think of effective ways to make students benefit most from it. For example, teachers may want to introduce students with lower proficiency level AWE systems operated with their mother tongue so that it could be easier for them to figure out the feedback provided. Second, based on the students’ responses to the fixed-response questions in the questionnaire, it seemed that more than half of the students were in favor of a potentially positive role of AWCF in their writing. However, because of their lower level of English proficiency, they might not be ready to effectively apply the feedback to revise their writings, which, in turn, made it unlikely for them to improve their writing quality. Thus, when asking students to receive AWCF, teachers may want to monitor their revision process and give timely support to individual student. For example, for EFL settings with relatively small class sizes, teachers can provide individualized feedback for students through one-on-one writing conference. In addition, students may mandatorily be asked to revise their writings multiple times to enhance their agency (Liao, 2016) and engagement with the feedback (Z. Zhang, 2020). Third, according to the students’ responses to the questionnaire, nearly all students mentioned language-related issues (e.g., grammar and vocabulary) as their obstacles in writing, with no students mentioning the importance of organization and content in their writing. However, these aspects are of great significance in making a good piece of writing. In this case, teachers need to enable students to be aware of the importance of these aspects in writing so that it is more likely for them to develop the ability to produce writing with appropriate organization and content.

Footnotes

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work was supported by the Funding of Post-doctoral Research Fellow Program of Zunyi Medical University under Grant FB-2022-2 and the Funding of Chinese Foreign Language Education under Grant ZGWYJYJJ10A055.

An Ethics Statement

The ethics issue is not applicable to the country where the study was conducted.

ORCID iD

Ning Fan

References

Aluthman

E. S.

(2016). The effect of using automated essay evaluation on ESL undergraduate students’ writing skill. International Journal of English Linguistics, 6(5), 54–67.

Bai

(2017). In the face of fallible AWE feedback: How do students respond? Educational Psychologist, 37(1), 67–81.

Bandura

(1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84(2), 191–215.

Bandura

(2012). On the functional properties of perceived self-efficacy revisited. Journal of Management, 38(1), 9–44. https://doi.org/10.1177/0149206311410606

Barrot

J. S.

(2023). Using automated written corrective feedback in the writing classrooms: Effects on L2 writing accuracy. Computer Assisted Language Learning, 36, 584–607. https://doi.org/10.1080/09588221.2021.1936071

Bitchener

(2017). Why some L2 learners fail to benefit from written corrective feedback. In Nassaji

Kartchava

(Eds.), Corrective feedback in second language teaching and learning (pp. 129–140). Routledge.

Bitchener

Ferris

D. R.

(2012). Written corrective feedback in second language acquisition and writing. Routledge.

Chandler

(2003). The efficacy of various kinds of error feedback for improvement in the accuracy and fluency of L2 student writing. Journal of Second Language Writing, 12, 267–296. https://doi.org/10.1016/S1060-3743(03)00038-9

Chen

C. E.

Cheng

(2008). Beyond the design of automated writing evaluation: Pedagogical practices and perceived learning effectiveness in EFL writing classes. Language Learning and Technology, 12(2), 94–112.

10.

Cheng

(2017). The impact of online automated feedback on students’ reflective journal writing in an EFL course. The Internet and Higher Education, 34, 18–27. https://doi.org/10.1016/j.iheduc.2017.04.002

11.

Cheville

(2004). Automated scoring technologies and the rising influence of error. The English Journal, 93(4), 47–52.

12.

Connor

(1990). Linguistic/rhetorical measures for international persuasive student writing. Research in the Teaching of English, 24, 67–87.

13.

Creswell

J. W.

Creswell

J. D.

(2018). Research design: Qualitative, quantitative, and mixed methods approaches (5th ed.). Sage.

14.

Dikli

(2006). An overview of automated scoring of essays. The Journal of Technology, Learning, and Assessment, 5(1), 1–35.

15.

Dikli

Bleyle

(2014). Automated essay scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing, 22, 1–17.

16.

El Ebyary

Windeatt

. (2010). The impact of computer-based feedback on students’ written work. International Journal of English Studies, 10(2), 121–142.

17.

Evans

Green

(2007). Why EAP is necessary: A survey of Hong Kong tertiary students. Journal of English for Academic Purposes, 6(1), 3–17. https://doi.org/10.1016/j.jeap.2006.11.005

18.

Fang

Y. C.

(2010). Perceptions of the computer-assisted writing program among EFL college learners. Educational Technology & Society, 13(3), 246–256.

19.

Ferris

D. R.

(2006). Does error feedback help student writers? New evidence on the short- and long-term effects of written error correction. In Hyland

Hyland

(Eds.), Feedback in second language writing: Contexts and issues (pp. 81–104). Cambridge University Press.

20.

Franzke

Kintsch

Caccamise

Johnson

Dooley

(2005). Summary Street^®: Computer support for comprehension and writing. Journal of Educational Computing Research, 33(1), 53–80.

21.

Gao

Min

(2021). A comparative study of the effects of L1 and L2 prewriting discussions on L2 writing performance. System, 103, 102654. https://doi.org/10.1016/j.system.2021.102654

22.

Ghufron

M. A.

Rosyida

(2018). The role of Grammarly in assessing English as a Foreign Language (EFL) writing. Lingua Cultura, 12(4), 395–403. https://doi.org/10.21512/lc.v12i4.4582

23.

Graham

Hebert

Paige Sandbank

Harris

K. R.

(2016). Assessing the writing achievement of young struggling writers: Application of generalizability theory. Learning Disability Quarterly, 39, 72–82.

24.

Grimes

(2008). Middle school use of automated writing evaluation [Unpublished Ph.D Dissertation]. University of California. http://douglasgrimes.com/windocs/Grimes–Middle%20School%20Use%20of%20AWE–Final%20Dissertation%20.doc

25.

Housen

Kuiken

Vedder

(2012). Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA. John Benjamins.

26.

Huang

Renandya

W. A.

(2020). Exploring the integration of automated feedback among lower-proficiency EFL learners. Innovation in Language Learning and Teaching, 14(1), 15–26. https://doi.org/10.1080/17501229.2018.1471083

27.

Huang

Jun Zhang

(2020). Does a process-genre approach help improve students’ argumentative writing in English as a foreign language? Findings from an intervention study. Reading and Writing Quarterly, 36(4), 339–364.

28.

Jiang

(2022). Appropriating automated feedback in L2 writing: Experiences of Chinese EFL student writers. Computer Assisted Language Learning, 35, 1329–1353. https://doi.org/10.1080/09588221.2020.1799824

29.

Jin

Yang

H. Z.

(2006). The English proficiency of college and university students in China: As reflected in the CET. Language, Culture and Curriculum, 19(1), 21–36. https://doi.org/10.1080/07908310608668752

30.

Johnson

M. D.

(2017). Cognitive task complexity and L2 written syntactic complexity, accuracy, lexical complexity, and fluency: A research synthesis and meta-analysis. Journal of Second Language Writing, 37, 13–38. https://doi.org/10.1016/j.jslw.2017.06.001

31.

Karim

Nassaji

(2020). The revision and transfer effects of direct and indirect comprehensive corrective feedback on ESL students’ writing. Language Teaching Research, 24, 519–539. https://doi.org/10.1177/1362168818802469

32.

C. H.

Sun

H. M.

Yang

Y. C.

(2012). Effects of user and system characteristics on perceived usefulness and perceived ease of use for the web-based classroom response system. The Turkish Online Journal of Educational Technology, 11(3), 128–143.

33.

Kim

J. E.

(2014). The effectiveness of automated essay scoring in an EFL college classroom. Multimedia-Assisted Language Learning, 17(3), 11–36. http://journal.kamall.or.kr/wp-content/uploads/2014/10/Kim_17_3_01.pdf

34.

Koltovskaia

(2020). Student engagement with automated written corrective feedback (AWCF) provided by Grammarly: A multiple case study. Assessing Writing, 44, 100450. https://doi.org/10.1016/j.asw.2020.100450

35.

Kormos

(2011). Task complexity and linguistic and discourse features of narrative writing performance. Journal of Second Language Writing, 20(2), 148–161.

36.

Kormos

(2012). The role of individual differences in L2 writing. Journal of Second Language Writing, 21, 390–403. https://doi.org/10.1016/j.jslw.2012.09.003

37.

Lai

Y. H.

(2010). Which do students prefer to evaluate their essays: Peers or computer program. British Journal of Educational Technology, 41(3), 432–454. https://doi.org/10.1111/j.1467-8535.2009.00959.x

38.

Lee

(2008). Student reactions to teacher feedback in two Hong Kong secondary classrooms. Journal of Second Language Writing, 17, 144–164. https://doi.org/10.1016/j.jslw.2007.12.001

39.

Liao

H. C.

(2016). Using automated writing evaluation to reduce grammar errors in writing. ELT Journal, 70(3), 308–319. https://doi.org/10.1093/elt/ccv058

40.

Link

Hegelheimer

(2015). Rethinking the role of automated writing evaluation (AWE) feedback in ESL writing instruction. Journal of Second Language Writing, 27, 1–18.

41.

Link

Mehrzad

Rahimi

(2022). Impact of automated writing evaluation on teacher feedback, student revision, and writing improvement. Computer Assisted Language Learning, 35, 605–634. https://doi.org/10.1080/09588221.2020.1743323

42.

Lin

S. M.

Griffith

(2014). Impacts of online technology use in second language writing: A review of the literature. Reading Improvement, 51(3), 303–312.

43.

Meng

Tian

Zhang

Xiao

(2019). Examining EFL learners’ individual antecedents on the adoption of automated writing evaluation in China. Computer Assisted Language Learning, 32, 784–804. https://doi.org/10.1080/09588221.2018.1540433

44.

Liu

Brown

(2015). Methodological synthesis of research on the effectiveness of corrective feedback in L2 writing. Journal of Second Language Writing, 30, 66–81. https://doi.org/10.1016/j.jslw.2015.08.011

45.

Liu

Q. D.

(2016). Effectiveness of coded corrective feedback in the development of linguistic accuracy in L2 writing: Impact of error types and learner attitudes (Doctoral dissertation). ProQuest. (10130907).

46.

Link

Yang

Hegelheimer

(2014). The role of automated writing evaluation holistic scores in the ESL classroom. System, 44, 66–78. https://doi.org/10.1016/j.system.2014.02.007

47.

Long

M. H.

(1996). The role of the linguistic environment in second language acquisition. In Ritchie

Bhatia

(Eds.), Handbook of second language acquisition (pp. 413–468). Academic Press.

48.

(2016). Exploring EFL learners’ lexical application in AWE-based writing. In Papadima-Sophocleous

Bradley

Thouësny

(Eds.), CALL Communities and culture-short papers from EURO CALL 2016 (pp. 295–301). Research-publishing. net. https://doi.org/10.14705/rpnet.2016.eurocall2016.578.

49.

(2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–496. https://doi.org/10.1075/ijcl.15.4.02lu

50.

Matsumoto

Akahori

(2008). Evaluation of the use of automate writing assessment software [Conference session]. C. Bonk, Lee, & Reynolds (Eds.), Proceedings of World Conference on E-learning in Corporate, Government, Healthcare, and Higher Education 2008, AACE, Chesapeake, VA (pp. 1827–1832).

51.

Mazgutova

Kormos

(2015). Syntactic and lexical development in an intensive English for Academic Purposes programme. Journal of Second Language Writing, 29, 3–15.

52.

McCarthy

P. M.

Jarvis

(2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42, 381–392.

53.

McNamara

D. S.

Graesser

A. C.

McCarthy

Cai

(2014). Automated evaluation of text and discourse with Coh-Metrix. Cambridge University Press.

54.

Nikou

S. A.

Economides

A. A.

(2017). Mobile-based assessment: Investigating the factors that influence behavioral intention to use. Computers & Education, 109, 56–73. https://doi.org/10.1016/j.compedu.2017.02.005

55.

Norris

J. M.

Ortega

(2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578. https://doi.org/10.1093/applin/amp044

56.

Parra

G. L.

Calero

S. X.

(2019). Automated writing evaluation tools in the improvement of the writing skill. International Journal of Instruction, 12(2), 209–226. https://doi.org/10.29333/iji.2019.12214a

57.

Paul

Woll

(2020). Using grammar checkers in an ESL context: An investigation of automatic corrective feedback. CALICO Journal, 37(2), 169–192.

58.

Ranalli

Yamashita

(in press). Automated written corrective feedback: Error-correction performance and timing of delivery. Language Learning & Technology.

59.

Rummel

Bitchener

(2015). The effectiveness of written corrective feedback and the impact Lao learners’ beliefs have on uptake. Australian Review of Applied Linguistics, 38, 66–84. https://doi.org/10.1075/aral.38.1.04rum

60.

Scharber

Dexter

Riedel

(2008). Students’ experiences with an automated essay scorer. Journal of Technology, Learning, and Assessment, 7(1), 4–45.

61.

Schiffrin

(1985). Everyday argument: The organization of diversity in talk. In van Dijk

(Ed.), Handbook of discourse analysis (Vol. 3, pp. 35–46). Academic Press.

62.

Schmidt

(1995). Attention and awareness in foreign language learning. University of Hawai’i Press.

63.

Schmidt

(2001). Attention. In Robinson

(Ed.), Cognition and second language instruction (pp. 3–32). Cambridge University Press.

64.

Schoonen

(2012). The validity and generalizability of writing scores. The effects of rater, task, and language. In Van Steendam

Tillema

Rijlaarsdam

Van Den Bergh

(Eds.), Measuring writing: Recent insights into theory, methodology, and practices (pp. 1–22). Brill.

65.

Shang

H. F.

(2022). Exploring online peer feedback and automated corrective feedback on EFL writing performance. Interactive Learning Environments, 30, 4–16. https://doi.org/10.1080/10494820.2019.1629601

66.

Shintani

(2016). The effects of computer-mediated synchronous and asynchronous direct corrective feedback on writing: A case study. Computer Assisted Language Learning, 29(3), 517–538.

67.

Sinha

T. S.

Nassaji

(2022). ESL learners’ perception and its relationship with the efficacy of written corrective feedback. International Journal of Applied Linguistics, 32, 41–56. https://doi.org/10.1111/ijal.12378

68.

Sommers

(2013). Responding to student writers. St. Martin’s.

69.

Stevenson

Phakiti

(2014). The effects of computer-generated feedback on the quality of writing. Assessing Writing, 19, 51–65. https://doi.org/10.1016/j.asw.2013.11.007

70.

Strauss

A. L.

Corbin

J. M.

(1998). Basics of qualitative research: Techniques and procedures for developing grounded theory (2nd ed.). Sage Publications.

71.

Teng

L. S.

Zhang

L. J.

(2020). Empowering learners in the second/foreign language classroom: Can self-regulated learning strategies-based writing instruction make a difference? Journal of Second Language Writing, 48, 100701. https://doi.org/10.1016/j.jslw.2019.100701

72.

Thi

N. K.

Nikolov

(2022). How teacher and Grammarly feedback complement one another in Myanmar EFL students’ writing. The Asia-Pacific Education Researcher, 31, 767–779. https://doi.org/10.1007/s40299-021-00625-2

73.

Vasylets

Marín

(2021). The effects of working memory and L2 proficiency on L2 writing. Journal of Second Language Writing, 52, 1–14. https://doi.org/10.1016/j.jslw.2020.100786

74.

Vojak

Kline

Cope

McCarthey

Kalantzis

(2011). New spaces and old places: An analysis of writing assessment software. Computers and Composition, 28(2), 97–111.

75.

Vygotsky

L. S.

(1978). Mind in society: The development of higher psychological processes. Harvard University Press.

76.

Wang

M. J.

Goodman

(2012). Automated writing evaluation: Students’ perceptions and emotional involvement. English Teaching and Learning, 36(3), 1–37. https://doi.org/10.6330/ETL.2012.36.3.01

77.

Wang

P. L.

(2013). Can automated writing evaluation programs help students improve their English writing? International Journal of Applied Linguistics and English Literature, 2(1), 6–12. https://doi.org/10.7575/ijalel.v.2n.1p.6

78.

Wang

(2019). An empirical study on the impact of an automated writing assessment on Chinese college students’ English writing proficiency. International Journal of Language and Linguistics, 7(5), 218–237. https://doi.org/10.11648/j.ijll.20190705.16

79.

Wang

Y. J.

Shang

H. F.

Briody

(2013). Exploring the impact of using automated writing evaluation in English as a foreign language university students’ writing. Computer Assisted Language Learning, 26(3), 234–257.

80.

Warden

C. A.

(2000). EFL business writing behaviors in differing feedback environments. Language Learning, 50(4), 573–616.

81.

Ware

(2011). Computer-generated feedback on student writing. TESOL Quarterly, 45(4), 769–774. https://doi.org/10.5054/tq.2011.272525

82.

Ware

(2014). Feedback for adolescent writers in the English classroom: Exploring pen-and-paper, electronic, and automated options. Writing & Pedagogy, 6(2), 223–249. https://doi.org/10.1558/wap.v6i2.223

83.

Warschauer

Ware

(2006). Automated writing evaluation: Defining the classroom research agenda. Language Teaching Research, 10(2), 157–180. https://doi.org/10.1191/1362168806lr190oa

84.

Warschauer

Ware

(2008). Learning, change, and power: Competing frames of technology and literacy. In Coiro

Knobel

Lankshear

Leu

(Eds.), Handbook of Research on New Literacies (pp. 215–240). Lawrence Erlbaum Associates.

85.

Wilson

Ahrendt

Fudge

E. A.

Raiche

Beard

MacArthur

(2021). Elementary teachers’ perceptions of automated feedback and automated scoring: Transforming the teaching and learning of writing using automated writing evaluation. Computers & Education, 168, 104208. https://doi.org/10.1016/j.compedu.2021.104208

86.

Wilson

Czik

(2016). Automated essay evaluation software in English Language Arts classrooms: Effects on teacher feedback, student motivation, and writing quality. Computers & Education, 100, 94–109. https://doi.org/10.1016/j.compedu.2016.05.004

87.

Zhang

(2022). Understanding AWE feedback and English writing of learners with different proficiency levels in an EFL classroom: A sociocultural perspective. The Asia-Pacific Education Researcher, 31, 357–367. https://doi.org/10.1007/s40299-021-00577-7

88.

Yoon

H. J.

Polio

(2017). The linguistic development of students of English as a second language in two written genres. TESOL Quarterly, 51, 275–301. https://doi.org/10.1002/tesq.296

89.

Zhai

(2022). Automated writing evaluation (AWE) feedback: A systematic investigation of college students’ acceptance. Computer Assisted Language Learning, 35, 2817–2842. https://doi.org/10.1080/09588221.2021.1897019

90.

Zhang

(2019). The strategies of college English writing. Sino-US English Teaching, 16(5), 203–208. https://doi.org/10.17265/1539-8072/2019.05.003

91.

Zhang

(2016). Reflections on the pedagogical imports of western practices for professionalizing ESL/EFL writing and writing-teacher education. Australian Review of Applied Linguistics, 39(3), 203–232. https://doi.org/10.1075/aral.39.3.01zha

92.

Zhang

L. J.

Cheng

(2021). Examining the effects of comprehensive written corrective feedback on L2 EAP students’ linguistic performance: A mixed-methods study. Journal of English for Academic Purposes, 54, 1–15.

93.

Zhang

(2020). Engaging with automated writing evaluation (AWE) feedback on L2 writing: Student perceptions and revisions. Assessing Writing, 43, 1–14.