Sage Journals: Discover world-class research

Abstract

The effectiveness of English Language Teaching (ELT) materials significantly influences the learning outcomes of English as a Foreign Language (EFL) learners, positioning these resources as cardinal in the acquisition process. Despite this widely acknowledged significance, there remains a notable gap in research concerning the evaluation of key factors, such as syntactic complexity and text readability, which determine text difficulty. To address this gap, the present study evaluated the syntactic complexity and readability of a series of Intensive Reading textbooks widely used by English majors in China by adopting data mining technologies, including the L2 syntactic complexity analyzer (L2SCA) and six readability indicators. It also aimed to explore the impact of syntactic complexity as a predictor variable on text readability using multiple regression analysis. The results revealed that the selected series of textbooks demonstrated an increasing pattern of syntactic complexity and a decreasing pattern of text readability, indicating the appropriateness of the text difficulty level for this series. It was also found that clause per T-unit (C/T), clause per sentence (C/S), and dependent clauses per clause (DC/C) were the most significant predictors of readability among the 14 indicators of syntactic complexity. The analytical approach, which leverages data-mining technologies for ELT material evaluation, sets a precedent for future research. The insights gained from this study offer valuable implications for the development of ELT materials in China and other similar EFL contexts, helping to optimize learning by providing appropriately challenging content.

Keywords

syntactic complexity text readability ELT textbooks data mining

Introduction

English-language teaching (ELT) materials are instrumental in shaping the learning process and outcomes in English as a Foreign Language (EFL) contexts (Ryu & Jeon, 2020; Tomlinson & Masuhara, 2018; Vitta, 2023). While ELT textbooks serve as the primary medium for imparting language knowledge and skills to students (Hughes, 2019), reading in a foreign language can significantly affect academic performance (Roussel et al., 2017). In this context, evaluating ELT material has emerged as a central focus of textbook studies worldwide (Graves, 2019; Hoang & Crosthwaite, 2024; Tomlinson, 2012), including in China (G. Yang & Chen, 2013). Despite the burgeoning interest in textbook evaluation (H. Zhang et al., 2021), existing research predominantly relies on evaluators’ interpretations of the checklist of evaluation criteria and other subjective methods such as questionnaire surveys and interviews, and thus lacks objectivity (Cheng & Zhao, 2021; Tomlinson, 2012). Consequently, there remains a notable research gap in the systematic evaluation of the syntactic complexity and text readability of ELT materials, especially those designed for English majors at an intermediate proficiency level. Text readability and syntactic complexity serve as objective measures of text difficulty, providing scientific evidence for textbook compilation and selection.

Analyzing the text difficulty of textbooks—defined as the degree of accessibility of the text to readers—allows educators to match instructional materials with students’ current competence levels, ensuring that texts are neither too easy nor excessively challenging (Amendum et al., 2018). This alignment supports incremental language learning and helps prevent learner frustration and disengagement (Vega et al., 2013; Y. H. Yang et al., 2021). Regarding the evaluation of ELT materials, text difficulty is typically assessed at various levels, including lexical, syntactic, and textual aspects (Chen, 2016; Lei & Shi, 2023). Among these indicators, syntactic complexity and text readability are two key factors. Syntactic complexity, a critical determinant of text difficulty, plays a significant role in the development and selection of appropriate textbooks. Specifically, text difficulty, as measured by the syntactic complexity in textbooks, is widely considered to increase linearly with the progression of learners’ linguistic competence (Chen, 2016). This premise is grounded in Krashen’s (1985) input hypothesis which posits that input slightly exceeding the learner’s current linguistic level constitutes the ideal input for promoting second language development and acquisition. Existing research has extensively examined the text complexity of different genres, such as academic papers and learner texts (Gedik & Kolsal, 2022; Wu et al., 2020) which may exhibit various syntactic complexity patterns (Hwang et al., 2020). In addition, the factors influencing syntactic complexity have been a significant focus of scholarly inquiry (Bulté & Housen, 2012). However, there is currently a paucity of empirical studies evaluating ELT materials including various types of text. In particular, the syntactic complexity of ELT textbooks designed for English majors with intermediate language proficiency has not been adequately explored.

Text readability has long been on the research agenda of textbook evaluation studies (Hakim et al., 2021). Existing research has explored the pedagogical functions, measurements, and factors influencing text readability. Text readability is widely recognized as an essential factor contributing to learners’ academic performance (Peng, 2015). Textbooks with low readability may trigger learners’ negative emotions, which impede their internalization of input (Krashen, 1985). Analyzing the readability of textbooks offers a valuable reference for textbook compilation and curriculum design (Hakim et al., 2021; J. Hu et al., 2021; Zamanian & Heydari, 2012). In addition, scholarship in this regard has mostly centered on the measurement of text readability (Sato et al., 2008; Yeung et al., 2018; Zamanian & Heydari, 2012) and factors that influence text readability (Eslami, 2014). However, there is a lack of focus on the nuanced and scientific measurement of text readability (Plakans & Bilkis, 2016), particularly in textbooks for university learners. In addition, syntactic complexity and text readability are inextricably connected (Crossley, 2024). Despite this connection, the relationship between the syntactic complexity and text readability remains unclear. Understanding this interplay is crucial because it enables educators to assess the appropriateness of educational materials for learners. This comprehension helps strike a balance between linguistic richness and accessibility in intensive reading textbooks, ensuring that the materials are challenging yet understandable, thereby fostering more effective and engaging learning experiences for students.

Given the paramount importance and the gaps mentioned above, the present study proposes to explore the syntactic complexity, text readability, and their relationship in a series of Intensive Reading textbooks for Chinese university English majors by utilizing such data mining technologies as L2 syntactic complexity analyzer (L2SCA; Lu, 2010), and six readability indicators including Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKG), Automated Readability Index (ARI), Coleman-Liau Index (CLI), Gunning-Fog Index (GFI), and Simple Measure of Gobbledygook (SMOG). It is hoped that the findings of the present study will be conducive to evaluating, compiling, and selecting textbooks with an appropriate level of difficulty in terms of syntactic complexity and text readability for the EFL context in China and other similar contexts.

Literature Review

Studies of Material Evaluation

Material evaluation has attracted significant attention in recent decades (Hanifa, 2018; Mukundan & Nimehchisalem, 2012; M. Yang & Shi, 2020). Both theoretical and methodological efforts have been made to explore material evaluation (Sheldon, 1988; Tomlinson, 2012).

The following theoretical orientations have emerged in textbook evaluation studies: classroom-based evaluation (i.e., Ellis, 1997), language-focused evaluation (i.e., Crandall & Basturkmen, 2004), culture-emphasized evaluation (i.e., M. Yang & Shi, 2020) and sociolinguistics-oriented evaluation (i.e., Atar & Erdem, 2020). Material evaluation can be predictive of textbook development and retrospective evaluation of existing textbooks (Ellis, 1997). These theoretical approaches echo what is known as critical reflection in material evaluation studies (M. Yang & Wang, 2024). Critical reflection studies focus on reviewing previous research on textbook evaluation, designing textbook evaluation criteria, and developing instruments to evaluate textbooks (W. Hu, 2024; Tomlinson, 2020, 2022). This endeavor equips us with profound theoretical insights into material evaluation, but it seems to lack empirical support (H. Zhang et al., 2021).

Previous empirical studies on textbook evaluation can be analyzed from both macro and micro perspectives (M. Yang & Wang, 2024). Macro-empirical studies on textbook evaluation have explored the holistic design of textbooks, particularly in terms of text selection and agreement with the curriculum, but failed to consider contextual divergences (Gholami et al., 2017; W. Zhang, 2014). Micro-empirical studies on textbook evaluation have examined linguistic content, involving lexical, grammatical, listening, and speaking elements (Chen, 2016; Hoang & Crosthwaite, 2024), as well as non-linguistic content, including culture, learner autonomy, ideology, and values. Existing micro-empirical research on textbook evaluation mainly takes the following two paths: The first path compares textbook corpora for EFL speakers with those designed for native speakers, intending to unveil the disparities from authentic communication situations in these EFL textbooks (Miller, 2011; Molavi et al., 2014). The second path explores the appropriateness of EFL textbooks, focusing on the coverage and frequency of words and phrases as stipulated in the curriculum (i.e., Liu & Zhang, 2015).

Early research has enriched our understanding of the vocabulary and phrases in these textbooks and provided evidence for their justified application. However, previous studies on material evaluation primarily focused on two issues that must be addressed. First, these studies mainly involved holistic and qualitative evaluations (W. Zhang, 2014). They relied on subjective means, such as questionnaires and interviews, to evaluate textbooks and, thus, exhibited lack of objectivity (Cheng & Zhao, 2021). Text readability and syntactic complexity can offer objective means of measuring text difficulty, thus providing evidence for the compilation and selection of textbooks. Second, there is currently limited attention given to syntactic complexity and text readability of textbooks. Textbooks serve as a major source of learner input; therefore, examining their syntactic complexity and text readability is essential for optimizing the language learning process (Abdollahi-Guilani, 2022; W. Hu, 2024).

Studies of Syntactic Complexity in Textbooks

Syntactic complexity refers to the diversity and complexity of syntactic structures in language production; that is, the degree of syntactic complexity and variation (Ortega, 2003). As a crucial indicator of text difficulty (Huang & Zheng, 2022), syntactic complexity plays a significant role in evaluating the appropriateness of textbooks, which serve as the primary source of input in second language (L2) development. While numerous components of textbooks have been explored, relatively little attention has been paid to the quality of textual input per se, particularly syntactic complexity. It is emphasized that text difficulty should align with readers’ competence by providing input that slightly exceeds their current level, a concept known as the “i + 1” principle (Krashen, 1985). Notably, texts with syntactically complexity pose considerable challenges to learners’ comprehension (Frantz et al., 2015). In addition, selecting and adapting textbooks with appropriate levels of text difficulty, as measured by syntactic complexity, is essential for achieving ideal learning outcomes (Spencer & Wagner, 2017). As a significant component in evaluating the comprehensibility of textbooks, and since syntactic modification is a critical part of the frequently adopted methods for adapting teaching materials (Berendes et al., 2018), it is of great significance to investigate syntactic complexity in textbooks. Without the meticulous and methodological orchestration of textbook difficulty across readers’ progressively increasing levels, the desired goal of competence development can hardly be attained (J. Song & Kim, 2021).

Motivated by these claims, extensive research has evaluated the syntactic complexity of various quantitative indices to increase the reliability and validity of evaluation methods. Currently, two indicators are commonly used to measure syntactic complexity: large- and fine-grained indices. The former measures overall sentential or clausal complexity (e.g., the length of the sentence), but fails to disclose the granularity of a specific language structure. Although there is general consensus regarding the positive relationship between measures such as mean length of T-unit (MLTU) and L2 development (Norris & Ortega, 2009), the interpretation of MLTU remains unclear, as various linguistic structures (e.g., phrasal dependents) can trigger an increase in the unit’s length. Therefore, more fine-grained indices have been proposed owing to developments in corpus linguistics. For instance, Biber et al. (1999) examined the clausal and phrasal features of academic writing using the Biber Tagger. Other computational tools have also been adopted, such as T.E.R.A (Solnyshkina et al., 2017), the Coh-Metrix (Ryu & Jeon, 2020), and the L2SCA (Y. Li et al., 2022). Among these tools, L2SCA is widely accepted for its robustness and operationalisability (Y. Li et al., 2022; Wu et al., 2020). Therefore, the present study utilizes L2SCA to explore the nuanced syntactic complexity of a series of Intensive Reading textbooks for Chinese tertiary English majors and offers additional empirical evidence for the validity and reliability of the L2SCA.

Previous research has predominantly focused on isolated genres or specific types of text, creating a gap in understanding how syntactic complexity manifests in educational materials designed for language learners. Extensive research has been conducted on the syntactic complexities of various genres. For instance, researchers have explored syntactic complexity from perspectives such as textual genres and registers, including argumentative (Y. Li et al., 2022), expository, and narrative essays, academic genres (Hwang et al., 2020; Larsson & Kaatari, 2020; Verdiansyah et al., 2020; X. Zhang & Li, 2022), and translation texts (Lin et al., 2023). Previous research offers a holistic overview of the factors that influence syntactic complexity in specific genres. However, empirical studies on textbook materials that include various types of text remain scarce. To the best of our knowledge, what remains unclear is precisely how syntactic complexity affects English language textbooks, as it compromises different types of text, such as narration, argumentation, and exposition, posing severe challenges to learners.

Among the limited research on the syntactic complexity of textbooks, existing studies mainly focus on primary, middle, and high school textbooks, with little attention given to college-level textbooks. For instance, Arai et al. (2017) analyzed the syntactic complexity of primary school textbooks. Solnyshkina et al. (2017) explored the complexity of eight Russian English textbooks. Similarly, Ryu and Jeon (2020) analyzed text difficulty across grades in Korean middle school English textbooks using the Coh-metrix. Gedik and Kolsal (2022) explored the syntactic complexity deficiencies in textbooks used for preparing for high school and college entrance examinations. These studies suggest that the syntactic complexity of textbooks is related to learners’ education levels. Syntactic complexity is designed to address individual differences among learners at various levels of their education. Learner variables include language competence, memory capacity, and motivation. For instance, textbooks for advanced learners (Zheng, 2018) and preliminary-level learners (Hwang et al., 2020) displayed diverse features of syntactic complexity. However, a review of the relevant literature indicates that research has primarily focused on the features of syntactic complexity of textbooks intended for primary and middle school learners. There has been a significant lack research exploring the features of syntactic complexity in textbooks designed for learners of tertiary education. To address this gap, the current research intends to explore the syntactic complexity patterns of English majors, who are typically at an intermediate level of English language proficiency.

Studies of Text Readability in Textbooks

Text readability refers to the level of ease with which a text can be read and understood (Goodman & Flurkey, 2019). The readability of textbooks can affect students’ academic performance. It has been demonstrated that less readable textbooks tend to result in lower average grades for the associated courses (Peng, 2015). Moreover, challenging textbooks might cause students to fail to comprehend the material, leading to frustration, and could also place a burden on instructors, who must ensure that are understood when explaining the content to learners (Peng, 2015). Learners in such situations, filled with anxiety, uncertainty, and threats may become demotivated when receiving and processing input (Krashen, 1985; C. Li et al., 2024). Therefore, it is incumbent upon instructors to carefully select appropriate textbooks that are readable for learners.

Given the importance of textbook readability, researchers have primarily examined its measurement (Yeung et al., 2018) and the factors influencing text readability (Eslami, 2014). The most common methods used to measure readability are formulas. Nevertheless, these measurement formulas are often criticized for their lack of reliability, as the results are often determined by instructors’ intuition (Plakans & Bilkis, 2016). In contrast, examining the readability of textbooks through data analytics could improve the quantification of textbook evaluation and promote the reading development of preliminary English as a second language learners (Kasule, 2011). The most commonly used formulas currently include the FRE, FKG, ARI, CLI, GFI, and SMOG (Yeung et al., 2018). These six formulas were used in this study. Existing research on the readability of textbooks mainly focuses on the readability of textbooks at the primary and middle school levels in EFL contexts, such as in Indonesia (Hakim et al., 2021) and Hong Kong, and mainland China (J. Hu et al., 2021). However, textbooks for university students, particularly for tertiary English majors, remain poorly understood. Since analyzing the readability of textbooks can provide scientific evidence to inform textbook compilation and accommodate learners at different levels of education (Hakim et al., 2021; J. Hu et al., 2021), it is necessary to further investigate the readability features of textbooks for tertiary English majors.

Despite the extensive research on readability, most studies have focused primarily on the lexical features of textbooks, leaving a significant gap in our understanding of the impact of syntactic complexity on readability. Researchers have explored the various factors influencing text readability, revealing that it is shaped by both linguistic and non-linguistic factors (Bailin & Grafstein, 2016). Reading motivation (Goodman & Flurkey, 2019), cultural background (Bailin & Grafstein, 2016) and reading environment have been identified as important variables in determining readability. In addition, readability is influenced by linguistic factors such as word length, the proportion of different word classes, sentence length (Bailin & Grafstein, 2016), pronouns, the number of syllables (Sung et al., 2015), and the numbers of affixes, prepositional phrases, and others (Bailin & Grafstein, 2016). In particular, syntactic-related linguistic features exert a significant influence on readability (Eslami, 2014). These features include the complexity of sentence structures, the use of dependent clauses, and the overall syntactic arrangement of a text, all of which can significantly affect how easily a reader comprehends and processes the material (Eslami, 2014). However, the predominant focus of existing research has been on the lexical features of textbooks. For instance, Y. Wang (2021) explored the relationship between lexical complexity, measured by the diversity and sophistication of vocabulary, and the readability of textbooks for English majors. J. Hu et al. (2021) investigated readability in terms of lexical coverage, or the extent to which words in a text are known by the target audience in science textbooks.

This lexical focus has provided valuable insights but also highlights a critical gap in the literature regarding syntactic complexity. Although lexical features are undoubtedly important, they do not capture the full scope of what makes text readable. Syntactic complexity, which encompasses elements such as sentence length, the ratio of dependent to independent clauses, and the use of varied syntactic structures, plays a crucial role in readability. Texts with complex syntactic structures may be challenging for readers, even if the vocabulary is relatively simple. Therefore, a comprehensive understanding of readability must include analyses of both syntactic complexity and lexical features. This gap in current research emphasizes the need for further exploration of how syntactic complexity impacts textbook readability, with the aim of providing a more holistic and nuanced understanding that can inform the development of educational materials. Furthermore, considerable uncertainty remains regarding the relationship between syntactic complexity and textbook readability, which warrants further exploration.

Syntactic Complexity and Text Readability in Textbooks

The relationship between syntactic complexity and text readability has been extensively explored, as it is considered a crucial factor in understanding and interpreting a text. Syntactic complexity has been identified as an important aspect of text readability because it can affect the ease with which readers comprehend a text (Khademizadeh & Vaezi, 2020).

Early studies have focused on sentential features when calculating readability scores (Frantz et al., 2015; Wu, 2017). Readability is closely associated with syntactic complexity (Frantz et al., 2015). It is important to note that the accuracy of measuring readability is undermined if syntactic complexity is not considered (Xing & Cheng, 2010). Furthermore, research has shown that syntactic complexity has a critical influence on the readability of academic papers written by Chinese scholars (Wu, 2017). This influence is also evident in the development of reading materials and textbooks (Khademizadeh & Vaezi, 2020). Therefore, it is essential to examine the relationship between syntactic complexity and readability in textbooks, as textbooks are considered the primary source of input in second language learning and play a vital role in the language development of students (Peng, 2015). However, only a few studies have explored the relationship between syntactic complexity and text readability, particularly in textbooks (Wu, 2017). Thus, exploring the relationship between syntactic complexity and readability in textbooks is crucial as it could provide valuable insights into the development of effective teaching materials and curriculum design for students at different proficiency levels (Crossley et al., 2008; Hakim et al., 2021; J. Hu et al., 2021). In line with this rationale, this study aims to investigate the relationship between syntactic complexity and readability in textbooks.

In summary, the aforementioned literature review reveals that early studies mainly examined the syntactic complexity and readability of written texts, such as learner texts and published academic papers, but paid inadequate attention to the features in these two aspects in textbooks. It is also indicated that previous studies primarily focused on lexical levels, but gave little attention to syntactic complexity and readability of textbooks (Tang & Liang, 2021). Third, past research has mostly focused on textbooks for primary and middle school students, but little is known about textbooks for tertiary English majors. It has been reported that the difficulty of English textbooks for Chinese universities does not align with their content volume and lacks generic diversity (H. Zhang et al., 2021). To address these gaps, the present study aims to examine syntactic complexity, text readability, and their relationship with a series of Intensive Reading textbooks for Chinese university English majors by utilizing data mining technologies, such as L2SCA, FRE, FKG, ARI, CLI, GFI, and SMOG, as well as multiple regression analysis models.

Research Design

Research Questions

To characterize the syntactic complexity and text readability of a series of Intensive Reading textbooks for Chinese tertiary English majors, and to explore how syntactic complexity may explain readability, this study intends to address the following three research questions:

Research Question 1: What are the features of syntactic complexity in Intensive Reading textbooks for Chinese tertiary English majors?

Research Question 2: What are the features of text readability in Intensive Reading textbooks used by Chinese tertiary English majors?

Research Question 3: How does syntactic complexity predict text readability in Intensive Reading textbooks for tertiary Chinese English majors?

Source of the Corpus

The corpus for this study was based on a series of Intensive Reading textbooks, specifically the Contemporary College English (second edition), which were developed and widely used by Chinese university English majors. This series was published by Beijing Foreign Language Teaching and Research Press, and organization affiliated with Beijing Foreign Studies University, the largest publishing house for foreign language education in China. In addition, Intensive Reading is one of the core courses for English majors in China, as stipulated in the National Curriculum for Undergraduate Programmes in Foreign Languages and Literatures of Institutions of Higher Education issued by the Ministry of Education of China in 2020. The textbooks are written with reference to Syllabus for the English Major in Colleges and Universities, serving as the primary resource for English majors to construct a solid foundation in English language knowledge and playing a critical role in nurturing the overall comprehensive competences. The comprehensive series of four books, that is, Book 1–4, is designed for the first four semesters of the undergraduate English program. Each of the four textbooks comprises 14 units, resulting in 56 texts for data analysis. This series reflects not only the language learning characteristics, but also the language learning objectives that learners are expected to achieve. Therefore, this corpus was deemed to be both representative and comprehensive for this study.

To ensure that the corpus accurately reflected the syntactic characteristics of the complete text, each text was sampled in its entirety. This approach was adopted to account for different stylistic features that may have been present at the beginning, throughout the narrative process, and at the end of the article. The main text in each unit was converted into .txt format, and irrelevant words in the pictures and headings were excluded. To ensure the reliability of the text conversion during the corpus design process, two professors, who had been using the selected textbooks for the past decade were consulted. Four teaching assistants with master’s degrees in applied linguistics were recruited to review and cross-check the converted texts. The resulting dataset had a size of 1.56 MB and contained 111,723 words for further data analysis. Specifically, Book 1 of the selected series of Intensive Reading textbooks includes 20,682 words, Book 2 contains 26,450 words, Book 3 contains 31,618 words, and Book 4 contains 32,973 words.

Measurement Indicators and Instrument

Data mining is an interdisciplinary practice that refers to the process of unveiling hidden patterns of behaviors from large datasets (CheshmehSohrabi & Mashhadi, 2023). This involves a hybrid application of machine learning, pattern recognition, and statistics. Although data mining techniques have been widely applied across various disciplines (Cope & Kalantzis, 2016), their application in the fields of education, applied linguistics, and language learning has only recently emerged (Warschauer et al., 2019). For instance, data mining techniques have been employed to understand learner behavior by analyzing syntactic complexity (Lu, 2010; X. Zhang & Lu, 2022) and text readability (Crossley et al., 2008) in L2 writing research. Previous research has demonstrated the potential of applying data mining techniques to explore learner behavior in the fields of language education and applied linguistics. Therefore, this study proposes to employ data mining techniques, such as L2SCA (Lu, 2010) and FRE, FKG, ARI, CLI, GFI, and SMOG (Yeung et al., 2018) to explore syntactic complexity and text readability of a series of Intensive Reading textbooks for Chinese tertiary English majors.

Measures and Instrument for Syntactic Complexity

Compared with large-grained measures of the overall complexity of syntactic structures, fine-grained indicators accurately assess specific language structures, such as the length of complex nominal phrases (Kyle & Crossley, 2018). Of these nuanced measurements, the L2SCA, developed using Python, has been widely accepted due to its robustness and operationalisability (Y. Li et al., 2022). This has been empirically validated through subsequent research on L2 writing, second language acquisition, and textbook studies (Lu, 2010; X. Zhang & Lu, 2022). Therefore, the present study adopts the L2SCA to explore the syntactic complexity of a series of Intensive Reading textbooks for English majors and offers additional empirical evidence for the validity and reliability of the L2SCA.

The L2SCA (accessible at http://www.personal.psu.edu/xxl13/downloads/l2sca.html), as an automatic tool for measuring L2 syntactic complexity, encompasses the following 14 indicators: mean length of sentence (MLS), mean length of T-unit (MLT), mean clause length (MLC), clause per sentence (C/S), verb phrases per T-unit (VP/T), clause per T-unit (C/T), dependent clauses per clause (DC /C), dependent clauses per T-unit (DC/T), T-units per sentence (T/S), complex T-unit per T-unit (CT/T), coordinate phrases per T-unit (CP/T), coordinate phrases per clause (CP/C), complex nominals per T-unit (CN/T), and complex nominals per clause (CN/C). These indicators are utilized for the present study to measure syntactic complexity of the texts included in the selected textbooks.

Measures and Instrument for Text Readability

The instruments applied for the present study to quantify text readability are several software tools commonly used in this field (Yeung et al., 2018). These instruments are available at https://we.sflep.com/research/ReadingEase.aspx, sponsored by China Foreign Language Teaching Network, which is affiliated to Shanghai Foreign Language Education Press of Shanghai International Studies University—another leading publishing house in the field of foreign language education in China—in addition to the Beijing Foreign Language Teaching and Research Press.

The measures employed in the present study to assess text readability include the following six indicators: FRE (Cantos Gómez & Sánchez Lafuente, 2019), FKG (J. Hu et al., 2021), ARI (De Oliveira et al., 2015), CLI (Cantos Gómez & Sánchez Lafuente, 2019), GFI (De Oliveira et al., 2015), and SMOG (Cantos Gómez & Sánchez Lafuente, 2019).

These measures have been applied as essential indicators in studies of text readability across various disciplines, such as health care, information science (Lei & Yan, 2016), education (Cantos Gómez & Sánchez Lafuente, 2019), psychology (Amendum et al., 2018), and tourism (Dolnicar & Chapple, 2015), and others. Recently, these indicators have been widely examined in academic contexts, languages, and linguistic studies (Y. Wang, 2021; Yeung et al., 2018). Research on text readability from the perspective of these indicators in academic writing (S. Wang et al., 2022) and textbooks (Y. Wang, 2021) has validated the reliability of these common formulas for calculating text readability. Therefore, the present study proposes the following measures to explore the text readability of a series of Intensive Reading textbooks for Chinese English majors: This was expected to generate additional evidence for the validity and reliability of these indicators. The six indicators are as follows:

Flesch Reading Ease (FRE) is a formula used to calculate the ease of reading a text (Cantos Gómez & Sánchez Lafuente, 2019). The calculation is based on the number of words and syllables per sentence. The resulting values ranged between 0 and 100. The larger the number, the easier it was to read the text. Difficulty levels were interpreted as follows: 0–30: Very difficult; 30–50: Difficult; 50–60: Fairly difficult; 60–70: Standard; 70–80: Fairly easy; 80–90: Easy; 90–100: Very easy. The calculation formula is as follows:

$\begin{matrix} FRE = 206.835 - 1.015 (\frac{total words}{total sentences}) \\ - 84.6 (\frac{total syllables}{total words}) \end{matrix}$

Flesch-Kincaid Grade Level (FKG) measures the comprehension of texts based on the grade level of US primary and middle schools (J. Hu et al., 2021). A higher score indicates a higher level of the author’s English writing skills and a correspondingly higher requirement for readers’ English reading level. The FKG is divided into 12 grades, which generally correspond to the grade levels of American primary and middle schools. For example, a score of 8.0 for a text indicates that a reader at the eighth-grade level can understand the content of the text. The formula is as follows:

$FKG = 0.39 (\frac{total words}{total sentences}) + 11.8 (\frac{total syllables}{total words}) - 15.59$

Automated Readability Index (ARI) is similar to the FKG. The calculated number of ARI roughly corresponds to the grade level of American primary and middle schools (De Oliveira et al., 2015). The calculation formula is as follows:

$ARI = 4.71 (\frac{characters}{words}) + 0.5 (\frac{words}{sentences}) - 21.43$

Coleman-Liau Index (CLI) is calculated using the following formula,

$CLI = 0.0588 L - 0.296 S - 15.8$

where L represents the average number of letters per 100 words and S represents the average number of sentences per 100 words: The resulting value approximately corresponds to the grade levels of American primary and middle schools.

Gunning-Fog Index (GFI) reflects the number of years of formal education required to understand a text. For example, a text with a fog number of 12 can be understood by senior high school students (approximately 18 years old) in the US (De Oliveira et al., 2015). In the following formula, “complex words” refer to words with three or more syllables.

$GFI = 0.4 [(\frac{words}{sentences}) + 100 (\frac{complex words}{words})]$

Simple Measure of Gobbledygook (SMOG) also reflects the number of years for formal education needed to understand the text (Cantos Gómez & Sánchez Lafuente, 2019). In the following formula, “polysyllables” refer to words with three or more syllables:

$\begin{matrix} SMOG = 1.0430 \\ \sqrt{number of polysyllables \times \frac{30}{number of sentences}} + 3.1291 \end{matrix}$

Data Analysis

To answer Research Question 1, which explored the syntactic complexity characteristics of the selected series of Intensive Reading textbooks for Chinese tertiary English majors, this study employed the L2SCA to calculate 14 indicators, as described in Section 3.3.1. We examined the patterns of syntactic complexity inBooks1 through4 of the selected textbooks. To do so, the converted texts in.txt format were first uploaded to the L2SCA website. The results were obtained in an Excel file after the L2SCA programme’s calculation function was performed. The Excel file reports the results for the 14 indicators of syntactic complexity of Book 1-4 respectively, namely, MLS, MLT, MLC, C/S, VP/T, C/T, DC /C, DC/T, T/S, CT/T, CP/T, CP/C, CN/T, and CN/C. To ensure the reliability of the calculation results, the authors initially emailed the L2SCA developer to obtain consent and guidance on using the software. After obtaining consent, the authors studied the operation of the L2SCA. We then piloted the analysis of the first book using the software and sent the results to the developer to verify the accuracy of our usage. Calculations were completed upon confirmation.

Similarly, to address Research Question 2, which enquired about the features of readability of the selected series of textbooks, six indicators of text readability were calculated, as introduced in Section 3.3.2. The readability features of the selected series of textbooks were then quantified to examine the patterns of text readability inBooks1through 4. To this end, the converted texts were entered into a website (https://we.sflep.com/research/ReadingEase.aspx). To gain access to this website, we contacted the sponsor of the China Foreign Language Teaching Network for getting permission. Upon obtaining consent, we registered the texts on the website and input them to calculate the readability of each book in terms of six indicators: FRE, FKG, ARI, CLI, GFI, and SMOG. To ensure the reliability of our data analysis, we consulted a PhD candidate in computing science on how to operate the software and interpret the results of the six formulas. Subsequently, we calculated the text readability of each of the four books through the website and recalculated it after 1 month to check the accuracy of the analysis results. The results were processed in an Excel file for subsequent analyses.

The data, in Excel format, derived from analyzing the first two research questions, were then computed into SPSS 27.0 to address Research Question 3. This question investigates how syntax influences the text readability of the selected series of Intensive Reading textbooks for Chinese tertiary English majors. To this end, a Pearson’s correlation analysis was first performed to examine the relationship between syntactic complexity and readability. Subsequently, multiple stepwise linear regression analyses were performed using the 14 indicators of syntactic complexity as independent variables and the six indicators of text readability as dependent variables in SPSS 27.0. Multiple stepwise linear regression models were established to investigate the extent to which syntactic complexity could explain text readability. Indicators with an insignificant correlation with readability (p > .05), a correlation coefficient less than 0.1, and those with violation of normal distribution and multiple collinearity, were excluded from the regression model.

Results

Syntactic Complexity of the Selected Intensive Reading Textbooks for English Majors

Table 1 reports the distribution of syntactic complexity in the selected series of Intensive Reading textbooks for Chinese English majors, based on the 14 indicators introduced in Section 3.3.1. Broadly speaking, Table 1 reveals a consistent upward trend in the 14 indicators of syntactic complexity from Books1-4 of the selected series of Intensive Reading textbooks for Chinese tertiary English majors.

Table 1.

Syntactic Complexity of the Selected Intensive Reading Textbooks for English Majors.

Book	MLS	MLT	MLC	C/S	VP/T	C/T	DC/C	DC/T	T/S	CT/T	CP/T	CP/C	CN/T	CN/C
Book 1	11.2341	10.4139	7.2289	1.554	1.6687	1.4406	0.2702	0.3892	1.0788	0.2915	0.1964	0.1363	0.8882	0.6166
Book 2	11.0762	12.2227	7.8147	1.4125	1.7976	1.5587	0.3080	0.4801	0.9062	0.3105	0.2278	0.1462	1.1396	0.7311
Book 3	13.9103	13.6520	8.0802	1.7215	2.0566	1.6896	0.3458	0.5842	1.0189	0.3934	0.2664	0.1577	1.4119	0.8357
Book 4	16.7376	15.0837	8.8566	1.8898	2.1404	1.7031	0.3763	0.6409	1.1096	0.4286	0.3632	0.2133	1.6121	0.9465
Average	13.2396	12.8431	7.9951	1.6445	1.9158	1.5980	0.32508	0.5236	1.0284	0.3560	0.2635	0.1634	1.2630	0.7825

Note. MLS = mean length of sentence; MLT = mean length of T-unit; MLC = mean clause length; C/S = clause per sentence; VP/T = verb phrases per T-unit; C/T = clause per T-unit; DC/C = dependent clauses per clause; DC/Tdependent clauses per T-unit; T/S = T-units per sentence; CT/T = = complex T-unit per T-unit; CP/T = coordinate phrases per T-unit; CP/C = coordinate phrases per clause; CN/T = complex nominals per T-unit; CN/C = complex nominals per clause.

It is noteworthy that the mean sentence length (MLS) for Book 2 decreases to 11.0762 from 11.2341 for Book 1, but increases to 13.9103 for Book 3 and 16.7376 for Book 4. A slight decrease was also noticed in clauses per sentence (C/S) in Book 2, with a value of 1.4125, which is lower than the 1.554 in Book 1. For the same indicator, the number increased to 1.7215 for Book 3 and 1.8898 for Book 4. Similarly, the T-units per sentence (T/S) for Book 2 decreased from 1.0788 in Book 1 to 0.9062 in Book 2, but increased to 1.0189 in Book 3 and 1.1096 in Book 4.

Distinguished from the three indicators mentioned above, the remaining 11 showed a steady increase from Books 1 to 4. Specifically, the mean length of T-units (MLT) for the four selected textbooks increased from 10.4139 in Book 1 to 12.2227 for Book 2, 13.6520 for Book 3, and 15.0837 for Book 4. The mean clause lengths (MLC) from Books 1 to 4 were 7.2289, 7.8147, 8.0802, and 8.8566, respectively. In addition, verb phrases per T-unit (VP/T) increment increased from 1.6687 in Book 1 to 2.1404 in Book 4. The number of clauses per T-unit (C/T) increases from 1.4406 in Book 1 to 1.7031 in Book 4. This steady growth is also observed in dependent clauses per clause (DC/C). For this indicator, the number increased from 0.3763 in Book 4 to 0.2702 in Book 1. The number of dependent clauses per T-unit (DC/T) increased from 0.3892 in Book 1 to 0.6409 in Book 4. The number of complex T-unit per T unit (CT/T) increased from 0.2915 in Book 1 to 0.4286 in Book 4. Similarly, the value for coordinate phrases per T-unit (CP/T) increased from 0.3632 in Book 4 to 0.1964 in Book 1; the coordinate phrases per clause (CP/C) moved up to 0.2133 in Book 4 from 0.1363 in Book 1; and the complex nominals per T-unit (CN/T) add up to 1.6121 for Book 4 from 0.8882 for Book 1. The number of complex nominals per clause (CN/C) increased from 0.6166 in Book 1 to 0.9465 in Book 4.

Text Readability of the Selected Intensive Reading Textbooks for English Majors

Table 2 presents the results for the six indicators of text readability for the selected series of Intensive Reading textbooks for Chinese tertiary English majors. Based on the evaluation criteria described in Section 3.3.2, Table 2 shows a consistent decrease in text readability across the six indicators from Books 1 to 4 of the selected series of textbooks.

Table 2.

Text Readability of the Selected Intensive Reading Textbooks for English Majors.

Book	FRE	FKG	ARI	CLI	GFI	SMOG
Book 1	89.32	3.91	4.13	5.80	6.73	6.95
Book 2	80.47	5.55	6.06	7.21	8.49	8.44
Book 3	70.66	7.00	7.77	8.49	10.14	9.51
Book 4	73.77	7.20	8.17	8.61	10.22	9.49
Average	78.56	5.92	6.53	7.53	8.90	8.60

Note. FRE = Flesch Reading Ease; FKG = Flesch-Kincaid Grade Level; ARI = Automated Readability Index; CLI = Coleman-Liau Index; GFI = Gunning-FOG Index; SMOG = simple measure of gobbledygook.

Specifically, the FRE values for the four books are 89.32 (Book 1), 80.47 (Book 2), 70.66 (Book 3), and 73.77 (Book 4), respectively. It can be observed that Book 1 and Book 2 fall into the Easy category, while Book 3 and Book 4 fall into the Fairly Easy category. It is interesting to note that Book 3 had the lowest readability among the four books, suggesting that it is the most difficult. The average value for this indicator was 78.56, indicating that the overall readability of the selected series of textbooks corresponds to the Fairly Easy level. Overall, the four books showed a decreasing pattern of readability, which corresponded to incremental difficulty.

A similar pattern of decreasing readability was observed with the FKG and ARI indicators. The FKG indices for Book 1 to Book 4 are 3.91 (Book 1), 5.55 (Book 2), 7.00 (Book 3), and 7.20 (Book 4) respectively. This suggests that the readability of the four books broadly corresponds to grades 4, 6, and 7 for American students. The average index of this indicator for the four books was 5.92, suggesting that the series of selected textbooks is roughly equal to the Grade 6 level of American students. Similarly, Table 3 shows that the indices of ARI for the four books are 4.13 (Book 1), 6.06 (Book 2), 7.77 (Book 3), and 8.17 (Book 4) respectively. These readability values suggest that the readabilities of the four books equal to the level of Grade 4, 6, 8 and 8 of American learners respectively. The average readability across all four books in this indicator was 6.53, which is roughly equivalent to Grade 7 for American learners.

Table 3.

Correlation Coefficients (r) Between Syntactic Complexity and Text Readability.

SC indicators	r(FRE)	r(FKG)	r(ARI)	r(CLR)	r(FOG)	r(SMOG)
MLS	−0.727**	0.905**	0.907**	0.727**	0.894**	0.818**
MLT	−0.754**	0.903**	0.899**	0.748**	0.895**	0.85**
MLC	−0.713**	0.82**	0.827**	0.777**	0.811**	0.792**
C/S	−0.49**	0.662**	0.659**	0.442**	0.653**	0.549**
VP/T	−0.531**	0.681**	0.666**	0.469**	0.675**	0.615**
C/T	−0.472**	0.598**	0.58**	0.381**	0.591**	0.535**
DC/C	−0.645**	0.762**	0.741**	0.57**	0.759**	0.711**
DC/T	−0.57**	0.703**	0.686**	0.493**	0.698**	0.641**
T/S	−0.287*	0.433**	0.45**	0.327*	0.426**	0.313*
CT/T	−0.623**	0.78**	0.769**	0.59**	0.772**	0.697**
CP/T	−0.627**	0.774**	0.791**	0.696**	0.761**	0.709**
CP/C	−0.572**	0.693**	0.713**	0.669**	0.68**	0.638**
CN/T	−0.814**	0.928**	0.923**	0.829**	0.926**	0.911**
CN/C	−0.747**	0.795**	0.784**	0.793**	0.797**	0.822**

p < .01. *p < .05.

The fourth indicator (CLI) also shows a decreasing trend in readability from Books 1 to 4. The readabilities of the four books are 5.80 (Book 1), 7.21 (Book 2), 8.49 (Book 3), and 8.61 (Book 4) respectively. The average number of indicators for the four books was 7.53. These values indicate a rough equivalence between the readability of the four books and the entire series, corresponding to the levels of Grades 6, 7, 8, 9, and 8 for American learners, respectively.

The fifth and sixth indicators, the GFI and SMOG, also display a decreasing pattern of readability. For instance, the readabilities for the four books in relation to GFI are 6.73 (Book 1), 8.49 (Book 2), 10.14 (Book 3), and 10.22 (Book 4), respectively. With regard to SMOG, the respective readabilities for the four books are 6.95 (Book 1), 8.44 (Book 2), 9.51 (Book 3), and 9.49 (Book 4). The average indices of these two indicators across the entire series are 8.90 and 8.60, respectively. These values indicate that the readability of these books corresponds to the levels of grades 7, 8, 10, and 10for American learners.

Influence of the of Syntactic Complexity as a Predictor Variable on Text Readability of the Selected Textbooks

To explore the relationship between syntactic complexity (SC) and text readability of the selected textbooks, Pearson’s correlation was first conducted using SPSS 27.0. Table 3 presents the correlation coefficients between syntactic complexity and text readability.

As shown in Table 3, all 14 indicators of syntactic complexity are negatively correlated with FRE, one of the readability indicators, and positively correlated with FKG, ARI, CLR, FOG, and SMOG, the other five readability indicators. Among the correlation coefficients between the 14 indicators of syntactic complexity and the six indicators of readability, those between CN/T and each of the six indicators of readability were the highest. Specifically, the correlation coefficient between CN/T and FRE is −.814, with .928 between CN/T and FKG, .923 between CN/T and ARI, .829 between CN/T and CLR, .926 between CN/T and FOG, and .911 between CN/T and SMOG. All of these correlation coefficients were higher than those between the remaining 13 indicators of syntactic complexity and the six indicators of readability. These results indicate that among the 14 indicators of syntactic complexity, CN/T had the strongest significant correlation with the six indicators of readability.

The significant correlations between the 14 indicators of syntactic complexity and the six indicators of readability highlight the feasibility of conducting multiple regression analysis to examine how syntactic complexity influences readability. Therefore, using the 14 indicators of syntactic complexity as independent variables and the six indicators of text readability as dependent variables, a multiple regression analysis was performed to explore the extent to which syntactic complexity accounts for readability, addressing the third research question.

Table 4 reports the model summaries from the multiple stepwise regression analysis. It shows that all 14indicators of syntactic complexity displayed a predictive effect on the text readability of the selected series of Intensive Reading textbooks for Chinese English majors. Specifically, the 14 indicators of syntactic complexity accounted for 74.9% (R² = .749) of FRE, 95.1% (R² = .951) in FKG, 95.3% (R² = .953) in ARI, 85.0% (R² = .850) in CLI, 94.6% (R² = .946) in GFI, and 89.7% (R² = .897) in SMOG, respectively. These results demonstrate a high goodness-of-fit for the models, indicating that the 14 indicators of syntactic complexity strongly predict text readability in the selected series of Intensive Reading textbooks for Chinese university English majors.

Table 4.

Model Summary of the Multiple Regression Analysis.

Model	R	R ²	Adjusted R²	Std. error of the estimate
1(FRE)	.865^a	.749	.663	7.93573
1(FKG)	.975^a	.951	.934	0.70383
1(ARI)	.976^a	.953	.936	0.81133
1(CLI)	.922^a	.850	.799	1.03485
1(GFI)	.973^a	.946	.928	0.78869
1(SMOG)	.947^a	.897	.862	0.75573

Predictors (Constant): CN/C, T/S, C/T, CP/C, VP/T, CT/T, MLC, DC/C, CN/T, DC/T, CP/T, MLS, C/S, MLT.

Fourteen syntactic complexity indicators were included as independent variables in the regression model. A Multiple Regression Analysis examining the predictive effect of these 14indicators of syntactic complexity on the six indicators of text readability yielded six prediction model expressions, as shown (Table 5). Among the 14 indicators upon FRE, C/S is the strongest predictor of FKG among the 14 indicators of syntactic complexity, while also most strongly predicting CLR; C/T has the strongest predictive effect among the 14 indicators in predicting ARI; C/S, in contrast with the other 13 indicators of syntactic complexity, most strongly predicts CLR; C/T has the strongest predictive effect among the 14 indicators on FOG; and C/T is the strongest factor among the 14 indicators in predicting SMOG. In summary, these six regression formulas show that C/T (clauses per T-unit), C/S (clauses per sentence), and DC/C (dependent clauses per clause) are the three strongest indicators for predicting text readability of the selected series of Intensive Reading textbooks for Chinese college English majors.

Table 5.

B Coefficients of the Multiple Regression Analysis.

Indicators	FRE	FKG	ARI	CLR	FOG	SMOG
(Constant)	123.634	3.077	3.755	0.108	14.155	14.494
MLS	9.269	−1.028	−1.285	−2.450	−1.514	−1.320
MLT	−7.554	1.150	1.013	1.766	2.009	1.582
MLC	−3.043	−0.197	0.490	1.057	−0.871	−0.612
C/S	−96.190	11.626	15.770	21.331	18.212	15.386
VP/T	13.700	−1.759	−2.260	−1.889	−1.310	−0.656
C/T	65.685	−12.016	−15.227	−18.702	−21.901	−17.782
DC/C	−127.252	4.095	−3.066	3.854	9.961	11.120
DC/T	42.025	−1.897	1.608	−0.289	−4.906	−5.539
T/S	29.649	−2.422	−4.597	−1.812	−6.582	−6.955
CT/T	−2.020	6.395	7.075	3.159	5.877	3.011
CP/T	−7.541	3.677	8.011	11.007	3.145	6.431
CP/C	−2.016	−2.033	−8.314	−13.651	−1.088	−7.976
CN/T	−13.917	2.802	4.156	3.233	3.329	2.307
CN/C	−9.957	1.747	0.783	1.974	2.071	2.677

Note. B: regression coefficient and Constant.

Discussion

This study explored the features of syntactic complexity, readability, and the predictive relationship between them in a selected series of Intensive Reading textbooks for Chinese tertiary English majors, utilizing data mining technologies such as L2SCA, FRE, FKG, ARI, CLI, GFI, and SMOG, as well as multiple regression models. The findings of this study offer quantitative evidence that syntactic complexity and readability are objective techniques for evaluating textbooks, thus compensating for the demerits of qualitative methods and of being subjective (Cheng & Zhao, 2021). Additionally, the results highlight that both syntactic complexity and text readability can serve as essential methods in textbook evaluation, complementing previous studies criticized for being too qualitative (Cheng & Zhao, 2021). Thus, these insights offer a paradigmatic reference for future research on material evaluation (W. Hu, 2024).

Specifically, this study explored the features of syntactic complexity in a selected series of Intensive Reading textbooks for Chinese tertiary English majors. The results, derived from calculating 14 indicators of syntactic complexity, revealed both the reasonability and shortcomings of the chosen textbooks. First, the results indicate a general trend of increasing difficulty in the texts across Books 1 to 4. This pattern of progressive difficulty throughout the four books aligns with the gradual process of language learning, where the level of challenge intensifies as learning progresses (Chen, 2016; Ryu & Jeon, 2020). The increasing difficulty of the selected textbooks can be explained by the incremental number of argumentative texts in Books 1 to 4. Compared to other genres, argumentative texts have been reported to exhibit relatively higher syntactic complexity (Pei, 2019). This result supports the rationale behind the compilation of the selected series of textbooks and provides empirical evidence that the fine-grained approach, using14 indicators, is reliable for quantifying syntactic complexity (Lu, 2010). This finding enriches the existing literature on the syntactic complexity of textbooks. While partly corroborating previous studies that identified a systematic progression of difficulty in textbooks in terms of vocabulary complexity (Gamson et al., 2013; X. Song, 2016), the findings of this study reveal an extension of the focus from exploring the complexity of textbooks at the lexical level to the syntactic level. This comprehensive view may offer valuable insights into the rationale for developing and evaluating textbooks and teaching materials in China and similar EFL contexts. In other words, syntactic complexity, as represented by the 14indicators, should be prioritized when compiling and developing textbooks.

Additionally, the results for the first research question indicated certain problems with the selected series of textbooks. While 11 of the 14 indicators of syntactic complexity consistently revealed a systematic progression in difficulty from Books 1 to 4, three indicators—MLS, C/S, and T/S—demonstrated certain divergence. That is, these three indicators experienced a slight decrease in Book 2 compared to Book 1, followed by a continuous increase from Books 2 to Book 3 and Book 4. These results indicate a violation of the principle of progressive difficulty in textbook compilation for the three indicators, as previously identified in EFL textbooks (Jin et al., 2020; Ryu & Jeon, 2020) and in studies on Chinese as a Second Language textbooks (Cao et al., 2022). This result partially corroborates previous research showing that ELT textbooks for Chinese English majors lack a reasonable distribution of text difficulty (H. Zhang et al., 2021). This result further indicates that the selected series of textbooks was problematic in its compilation of these three indicators. Therefore, it is imperative for developers to consider these three syntactic complexity factors. This need arises because MLS, C/S, and T/S are effective indicators of text difficulty (Graesser et al., 2007; Lu, 2010), cognitive load, competence in mental processing (Bonzo, 2008), and strategies of textbook development (Lei & Shi, 2023).

While the first research question revealed an increasing trend in syntactic complexity, the second research question of the present study suggested a decreasing trend in text readability from Books 1 to 4 in the selected series of textbooks, indicating an increasing progression of difficulty (J. Song & Kim, 2021). This result aligns with the syntactic complexity reported earlier in this study, highlighting that syntactic features such as the number and length of sentences, complexity of vocabulary, and number of syllables are closely related to readability (Bailin & Grafstein, 2016; Sung et al., 2015). It also reflects the logical progression of difficulty in textbook development and embodies the developmental process of foreign language acquisition (Chen, 2016).

Another finding associated with Research Question 2 is that the readabilities of Books 1 and 2 correspond to those of seniors in American primary schools and juniors in American middle schools. In comparison, the readabilities of Books 3 and 4 are equal to the levels of seniors in American middle schools and juniors in American high schools. Readability is closely related to factors, such as motivation (Goodman & Flurkey, 2019), and cultural background knowledge (Bailin & Grafstein, 2016). Considering that English functions as a foreign language in China, where EFL learners have limited opportunities to interact with native English speakers and cultures, achieving native-level reading proficiency can be challenging for Chinese EFL learners. Therefore, it is logical for textbooks to be designed with readability that caters to the specific context of English language learning in China.

A third noteworthy finding related to Research Question 2 is that, while four indicators—FKG, ARI, CLI, and GFI—consistently displayed a decreasing pattern of readability from Books 1 to 4, FRE and SMOG showed certain variations in Book 3. In other words, Book 3 exhibited the lowest FRE value and highest SMOG value, indicating the lowest degree of readability and correspondingly the highest degree of difficulty compared to the other three books. This result diverges from previous studies, which suggest that indicators, such as FRE, demonstrate a progressive distribution of readability in English textbooks (Ryu & Jeon, 2020). This result highlights issues with the selected series in terms of readability, as indicated by the FRE and SMOG. Since these two indicators are essential for the adaptation and development of textbooks and teaching materials adaptation and development (Im et al., 2015), it is important for textbook developers to consider them in further compilation.

Regarding the influence of syntactic complexity as a predictor variable on text readability (Research Question 3), the present study found that all 14 indicators of the former could predict the latter to some degree, with C/T (clauses per T-unit), C/S (clauses per sentence), and DC/C (dependent clauses per clause) being the strongest indicators in predicting text readability of the selected series of textbooks. These results suggest that subordinate structures strongly influence text readability (Eslami, 2014; Kyle & Crossley, 2018). This can be explained as follows: the above subordinated structures are actually embedded syntactic structures, which offer flexibility in expressing ideas and thus function as the major indicators of readability (Wu, 2017; X. Zhang & Li, 2022). In addition, the results regarding the predicative power of subordinated structures, such as C/T, C/S, and DC/C on readability contradict those of previous studies conducted in the fields of academic writing (S. Wang et al., 2022; Wu, 2017), extracurricular reading materials (Lei & Shi, 2023), and teaching resources used in primary and secondary schools (Jin et al., 2020). This inconsistency may indicate the unique features of the selected series of textbooks for tertiary English majors, thus warranting the need for further exploration.

Conclusion

This study found that 11 of the 14 indicators of syntactic complexity consistently demonstrated a systematic progression of difficulty from Books 1 to 4 of the selected series of textbooks. However, three indicators, including MLS, C/S, and T/S, showed opposite tendencies. In addition, the selected series lacked systematic readability, as evidenced by the mutation of the FRE and SMOG indicators in BOOK 3. Finally, the results indicate that all 14 indicators of syntactic complexity contributed to predicting readability to some extent. In particular, clause-related features, such as clause per T-unit (C/T), clause per sentence (C/S), and dependent clauses per clause (DC/C) are the strongest predictors of readability among the 14 indicators of syntactic complexity.

This study’s findings have several theoretical implications. First, the results demonstrate that syntactic complexity and readability can function as important methods in conducting quantitative material evaluations, thus mitigating the shortcomings of previous research, which were considered subjective and qualitative (Cheng & Zhao, 2021). For instance, the inclusion of 14 indicators could enrich the literature on syntactic complexity and provide empirical evidence forL2SCA in the context of textbook evaluation. In contrast to most previous studies, which included only one or two readability indicators, this study examined all six commonly recommended readability indicators. This extends the readability indicators explored in previous studies. In addition, this study examined the predictive effect of syntactic complexity on readability—an area that has been rarely addressed in existing literature (Wu, 2017). Therefore, this study expands the scope of the research in this field. Finally, unlike previous studies that primarily examined learners’ written texts, published academic papers, or focused on textbooks for primary and middle school students, this study expands its focus to university English majors, who remained underexplored.

This study has significant implications for the development of EFL teaching methods and textbooks. For instance, this study found that genre plays a crucial role in contributing to syntactic complexity. Thus, genre should be carefully considered when designing textbooks for learners in EFL and English-medium instruction (EMI) contexts, where teaching materials are aimed at learning content subjects in English (C. Li, 2023; Richards & Pun, 2022; Widodo et al., 2022). Second, this study found that the selected Intensive Reading texts for Chinese English majors in their first 2 years of undergraduate studies were written at the readability level of students below the senior level of American high schools. This implies that, after 2 years of academic studies, Chinese English majors were unable to reach the reading level of high school students in native English-speaking countries. This equivalence warrants attention when selecting authentic materials for textbook development and aligning potential teaching objectives with corresponding international language proficiency standards, such as the Common European Framework of Reference for Languages (CEFR; Zhu & Cao, 2020). Another interesting point is the FRE indicator, which implies that the readability of Book 4 in the selected series of textbooks was higher than that of Book 3. In other words, the difficulty level of Book 4 was lower than that of Book 3. This could be interpreted as an indication that the readability of Book 4 was further adjusted during the redesign of the selected series of textbooks. Finally, the conclusion that syntactic complexity indicators, such as clauses per T-unit (C/T), clauses per sentence (C/S), and dependent clauses per clause (DC/C), are the strongest factors in predicting readability implies an emphasis on textbook design and teaching contents in class (Norris & Ortega, 2009). It is insightful that explicit instruction in syntactic complexity, particularly subordinate structures, should be given due attention in class of English majors (Lei & Shi, 2023; Lei & Yan, 2016).

It should be acknowledged that the present study has certain limitations. Firstly, given a large number of indicators proposed by different scholars, it is not feasible to cover all the measures in one study. Therefore, the findings generated by other indicators may vary. Second, this study focused exclusively on the features of syntactic complexity and readability in a series of Intensive Reading textbooks for English majors. Third, this study explored only linguistic factors related to syntactic complexity and readability. In future research, it would be beneficial to delve deeper into the genre-specific aspects of syntactic complexity and readability, considering their implications for textbook design and language teaching. In addition, longitudinal studies that track students’ reading progress over time can provide valuable insights into the long-term effects of syntactic complexity on language proficiency. Moreover, comparative studies that examine textbooks across different languages and educational systems could offer a broader perspective on the interplay between syntactic complexity, readability, and educational outcomes. It is also advisable for future studies to consider non-linguistic factors that impact textbook users, such as teachers and learners, to gain a more comprehensive understanding of the dynamics involved in textbook consumption.

Footnotes

Correction (March 2025):

The corresponding author has been changed for the article.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: The Humanities and Social Sciences Foundation,Ministry of Education of China (grant number: 22YJA740016),the Key Project of Hubei Provincial Department of Education Philosophy and Social Science Research Fund (No. 21ZD051),and the Teaching and Research Fund of Hubei University of Technology (No. Xiao2022018).

Ethics Approval

This article does not contain any studies with human participants performed by any of the authors.

Informed Consent

This article does not contain any studies with human participants performed by any of the authors. The institution’s ethics committee also gave permission to conduct the research.

ORCID iD

Chili Li

Data Availability

The data supporting the findings of this study are available upon reasonable request from the corresponding author.

References

Abdollahi-Guilani

(2022). Readability index and reading complexity in high school EFL textbooks. Journal of Foreign Language Research, 12(2), 37–59.

Amendum

S. J.

Conradi

Hiebert

(2018). Does text complexity matter in the elementary grades? A research synthesis of text difficulty and elementary students’ reading fluency and comprehension. Educational Psychology Review, 30(1), 121–151.

Arai

Bunji

Ishihara

Matsuzaki

Kageura

(2017). Complexities of text from the points of view of lexical and syntactic characteristics: Quantitative analysis of linguistic features of primary school textbooks. Mathematical Linguistics, 31(2), 144–159.

Atar

Erdem

(2020). A sociolinguistic perspective in the analysis of English textbooks: Development of a checklist. Research in Pedagogy, 10(2), 398–416.

Bailin

Grafstein

(2016). Readability: Text and context. Palgrave Macmillan.

Berendes

Vajjala

Meurers

Bryant

Wagner

Chinkina

Trautwein

(2018). Reading demands in secondary school: Does the linguistic complexity of textbooks increase with grade level and the academic orientation of the school track? Journal of Education & Psychology, 110(4), 518–543.

Biber

Johansson

Leech

Conrad

Finegan

(1999). Longman grammar of spoken and written English. Longman.

Bonzo

J. D.

(2008). To assign a topic or not: Observing fluency and complexity in intermediate foreign language writing. Foreign Language Annals, 41(4), 722–735.

Bulté

Housen

(2012). Defining and operationalising L2 complexity. In Housen

Kuiken

Vedder

(Eds.), Dimensions of L2 performance and proficiency : Complexity, accuracy and fluency in SLA (pp. 21–46). John Benjamins.

10.

Cantos Gómez

Sánchez Lafuente

Á. A

. (2019). Readability indices for the assessment of textbooks: A feasibility study in the context of EFL. International Journal of Applied Linguistics, 16(16), 31–52.

11.

Cao

Tian

(2022). A quantitative study on the measure index of syntactic complexity in textbooks for Chinese as a second language. In Dong

Hong

(Eds.), Chinese lexical semantics (pp. 233–245). Springer.

12.

Chen

A. C. H.

(2016). A critical evaluation of text difficulty development in ELT textbook series: A corpus-based approach using variability neighbour clustering. System, 58, 64–81.

13.

Cheng

Zhao

(2021). Writing language textbooks for foreign language majors: Problems and suggestions. Shandong Foreign Language Teaching, 42(1), 40–48.

14.

CheshmehSohrabi

Mashhadi

(2023). Using data mining, text mining, and bibliometric techniques to the research trends and gaps in the field of language and linguistics. Journal of Psycholinguistic Research, 52(2), 607–630.

15.

Cope

Kalantzis

(2016). Big data comes to school: Implications for learning, assessment, and research. AERA Open, 2(2), 1–19.

16.

Crandall

Basturkmen

(2004). Evaluating pragmatics-focused materials. ELT Journal, 58(1), 38–49.

17.

Crossley

S. A.

(2024). Developing linguistic constructs of text readability using natural language processing. Scientific Studies of Reading. Advance online publication.

18.

Crossley

S. A.

Greenfield

McNamara

D. S.

(2008). Assessing text readability using cognitively based indices. TESOL Quarterly, 42(3), 475–493.

19.

De Oliveira

G. S.

Jung

Mccaffery

K. J.

McCarthy

R. J.

Wolf

M. S

. (2015). Readability evaluation of Internet-based patient education materials related to the anesthesiology field. Journal of Clinical Anesthesia, 27(5), 401–405.

20.

Dolnicar

Chapple

(2015). The readability of articles in tourism journals. Annals of Tourism Research, 52, 161–166.

21.

Ellis

(1997). The empirical evaluation of language teaching materials. ELT Journal, 51(1), 36–42.

22.

Eslami

(2014). The effect of syntactic simplicity and complexity on the readability of the text. Journal of Language Teaching and Research, 5(5), 1185–1191.

23.

Frantz

R. S.

Starr

L. E.

Bailey

A. L.

(2015). Syntactic complexity as an aspect of text complexity. Educational Researcher, 44(7), 387–393.

24.

Gamson

D. A.

Eckert

S. A.

(2013). Challenging the research base of the common core state standards: A historical reanalysis of text complexity. Educational Researcher, 42(7), 381–391.

25.

Gedik

T. A.

Kolsal

Y. S.

(2022). A corpus-based analysis of high school English textbooks and English University entrance exams in Turkey. Theory and Practice of Second Language Acquisition, 8(1), 157–176.

26.

Gholami

Noordin

Rafik-Galea

(2017). A thorough scrutiny of ELT textbook evaluations: A review inquiry. International Journal of Education and Literacy Studies, 5(3), 82–91.

27.

Goodman

Flurkey

(2019). Readability. In Damico

J. S.

Ball

M. J.

(Eds.), The Sage encyclopedia of human communication sciences and disorders (p. 1561). Sage Publications.

28.

Graesser

A. C.

Jeon

Yan

Cai

(2007). Discourse cohesion in text and tutorial dialogue. Information Design Journal, 15(3), 199–213.

29.

Graves

(2019). Recent books on language materials development and analysis. ELT Journal, 73(3), 337–354.

30.

Hakim

A. A.

Setyaningsih

Cahyaningrum

(2021). Examining the readability level of reading texts in English textbook for Indonesian Senior High School. Journal of English Language Studies, 6(1), 18.

31.

Hanifa

(2018). EFL published materials: An evaluation of English textbooks for junior high school in Indonesia. Advances in Language and Literary Studies, 9(2), 166–174.

32.

Hoang

Crosthwaite

(2024). A comparative analysis of multiword units in the reading and listening input of English textbooks. System, 121, 1–14.

33.

Huang

Zheng

(2022). Does learning two foreign languages simultaneously hinder the development of the first foreign language? Evidence from syntactic complexity. Modern Foreign Languages, 45(5), 1–13.

34.

Hughes

S. H.

(2019). Coursebooks: Is there more than meets the eye? ELT Journal, 73(4), 447–455.

35.

Gao

Qiu

(2021). Lexical coverage and readability of science textbooks for English-medium instruction secondary schools in Hong Kong. Sage Open, 11(1), 1–9.

36.

(2024). The construction and enlightenment of ESL textbook evaluation checklists in foreign countries. Foreign Language World, 4(1), 42–48.

37.

Hwang

Jung

H. Y.

Kim

(2020). Effects of written versus spoken production modalities on syntactic complexity measures in beginning-level child EFL learners. Modern Language Journal, 104(1), 267–283.

38.

Cho

Jong

(2015). A continuity analysis of the reading passages of elementary school 6th grade and middle school 1st grade English textbooks. Foreign Languages Education, 22(1), 125–148.

39.

Jin

(2020). Syntactic complexity in adapted teaching materials: Differences among grade levels and implications for benchmarking. Modern Language Journal, 104(1), 192–208.

40.

Kasule

(2011). Textbook readability and ESL learners. Reading and Writing, 2(1), 63–76.

41.

Khademizadeh

Vaezi

M. R.

(2020). Evaluating readability of Persian fictions selected by flying turtle the Iranian Award. Publishing Research Quarterly, 36(1), 116–128.

42.

Krashen

S. D.

(1985). The input hypothesis: Issues and implications. Longman.

43.

Kyle

Crossley

S. A.

(2018). Measuring syntactic complexity in L2 writing using fine-grained clausal and phrasal indices. Modern Language Journal, 102(2), 333–349.

44.

Larsson

Kaatari

(2020). Syntactic complexity across registers: Investigating (in)formality in second-language writing. Journal of English for Academic Purposes, 45, 100850.

45.

Lei

Shi

(2023). Syntactic complexity in adapted extracurricular reading materials. System, 113, 103002.

46.

Lei

Yan

(2016). Readability and citations in information science: Evidence from abstracts and articles of four journals (2003–2012). Scientometrics, 108(3), 1155–1169.

47.

(2023). Exploring L2 motivational dynamics among Chinese EAP learners in an EMI context from a socio-cultural perspective. Sage Open, 13(2), 1–13.

48.

Zhao

Pan

Qian

(2024). Towards lessening learners’ aversive emotions and promoting their mental health: Developing and validating a measurement of English speaking demotivation in the Chinese EFL context. International Journal of Mental Health Promotion, 26(2), 161–175.

49.

Lin

Afzaal

Aldayel

H. S.

(2023). Syntactic complexity in legal translated texts and the use of plain English: A corpus-based study. Humanities and Social Sciences Communications, 10(1), 17.

50.

Liu

Zhang

L. J. Z.

(2015). A corpus-based study of lexical coverage and density in college English textbooks. Foreign Language Education in China, 1(2), 42–50.

51.

Nikitina

Riget

P. N.

(2022). Development of syntactic complexity in Chinese university students’ L2 argumentative writing. Journal of English for Academic Purposes, 56, 101099.

52.

(2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–496.

53.

Miller

(2011). ESL reading textbooks vs. University textbooks: Are we giving our students the input they may need? Journal of English for Academic Purposes, 10(1), 32–46.

54.

Molavi

Koosha

Hosseini

(2014). A comparative corpus-based analysis of Lexical collocations used in EFL textbooks. Latin American Journal of Content and Language Integrated Learning, 7(1), 66–81.

55.

Mukundan

Nimehchisalem

(2012). Evaluative criteria of an English language textbook evaluation checklist. Journal of Language Teaching and Research, 3(6), 1128–1134.

56.

Norris

J. M.

Ortega

(2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578.

57.

Ortega

(2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics, 24(4), 492–518.

58.

Pei

(2019). A study of the impact of different genres on the syntactic complexity of Chinese EFL learners’ writing. College English Teaching and Research, 4(2), 94–97.

59.

Peng

C. C.

(2015). Textbook readability and student performance in online introductory corporate finance classes. Journal of Educators Online, 13(2), 35–49.

60.

Plakans

Bilkis

(2016). Cohesion features in ESL reading: Comparing beginning, intermediate and advanced textbooks. Reading in a Foreign Language, 28(1), 79–100.

61.

Richards

J. C.

Pun

(2022). Teaching and learning in English medium instruction: An introduction. Routledge.

62.

Roussel

Joulia

Tricot

Sweller

(2017). Learning subject content through a foreign language should not ignore human cognitive architecture: A cognitive load theory approach. Learning and Instruction, 52, 69–79.

63.

Ryu

Jeon

(2020). An analysis of text difficulty across grades in Korean middle school English textbooks using Coh-Metrix. Journal of AsiaTEFL, 17(3), 921–936.

64.

Sato

Matsuyoshi

Kondoh

(2008, May 26–June 1). Automatic assessment of Japanese text readability based on a textbook Corpus [Conference session]. Proceedings of the International Conference on Language Resources and Evaluation, LREC 2008, Marrakech, Morocco (pp. 654–660).

65.

Sheldon

L. E.

(1988). Evaluating ELT textbooks and materials. ELT Journal, 42(4), 237–246.

66.

Solnyshkina

M. I.

Vishnyakova

O. D.

Gafiyatova

E. V.

Gabitov

A. I.

(2017). English textbooks for Russian students: Problems and specific features. Journal of Social Studies Education Research, 8(3), 215–226.

67.

Song

Kim

(2021). An analysis of the reading passages represented in the textbooks of middle/high school and national achievement English tests through syntactic complexity and readability. Secondary English Education, 14(2), 48–66.

68.

Song

(2016). A corpus-based research on vocabulary of textbooks for English majors. Educational Review, (8), 123–126.

69.

Spencer

Wagner

R. K.

(2017). The comprehension problems for second-language learners with poor reading comprehension despite adequate decoding: A meta-analysis. Journal of Research in Reading, 40(2), 199–217.

70.

Sung

Y. T.

Lin

W. C.

Dyson

S. B.

Chang

Chen

(2015). Leveling L2 texts through readability: Combining multilevel linguistic features with the CEFR. Modern Language Journal, 99(2), 371–391.

71.

Tang

Liang

(2021). A study of level difference in college English textbooks’ lexical complexity. Foreign Language Education in China, 1(2), 61–68.

72.

Tomlinson

(2012). Materials development for language learning and teaching. Language Teaching, 45(2), 143–179.

73.

Tomlinson

(2020). Is materials development progressing? Language Teaching Research Quarterly, 15, 1–20.

74.

Tomlinson

(2022). Materials development for language learning: Ways of connecting practice and theory in Coursebook development and use. In Hinkel

(Ed.), Handbook of practical second language teaching and learning (pp. 133–147). Routledge.

75.

Tomlinson

Masuhara

(2018). The complete guide to the theory and practice of materials development for language learning. John Wiley & Sons.

76.

Vega

Feng

Lehman

Graesser

D’Mello

(2013). Reading into the text: Investigating the influence of text complexity on cognitive engagement [Conference session]. In D’Mello

S. K.

Calvo

R. A.

Olney

(Eds.), Proceedings of the 6th international conference on educational data mining (EDM 2013) (pp. 296–299). International Educational Data Mining Society.

77.

Verdiansyah

M. Z.

Sahiruddin

Degeng

P. D. D.

(2020). Text complexity in reading texts of Indonesian senior high school English textbooks using coh-metrix 3.0. Diglossia, 12(1), 1–10.

78.

Vitta

J. P.

(2023). The functions and features of ELT textbooks and textbook analysis: A concise review. RELC Journal, 54(3), 856–863.

79.

Wang

Liu

Zhou

(2022). Readability is decreasing in language and linguistics. Scientometrics, 127(8), 4697–4729.

80.

Wang

(2021). A comparative study on readability of reading texts in coursebooks and testing for English majors. Foreign Languages Research, 2(4), 70–75.

81.

Warschauer

Yim

Lee

Zheng

(2019). Recent contributions of data mining to language learning research. Annual Review of Applied Linguistics, 39, 93–112.

82.

Widodo

H. P.

Fang

Elyas

(2022). Designing English language materials from the perspective of global Englishes. Asian Englishes, 24(2), 186–198.

83.

(2017). A study on syntactic complexity and text readability of international journal articles by Chinese scholars. Journal of PLA University of Foreign Languages, 40(05), 11–19.

84.

Mauranen

Lei

(2020). Syntactic complexity in English as a lingua franca academic writing. Journal of English for Academic Purposes, 43, 1–13.

85.

Xing

Cheng

(2010). English text readability based on statistical models. Journal of PLA University of Foreign Languages, 6(88), 19–24.

86.

Yang

Chen

(2013). The status quo of and reflections on English teaching materials research at tertiary level since 2000. Foreign Languages and Their Teaching, 2(3), 16–19.

87.

Yang

Shi

(2020). Constructing an evaluation framework of English materials from cultural and functional perspectives. Contemporary Foreign Language Studies, 3(2), 57–67.

88.

Yang

Wang

(2024). A study on the evaluation framework of legal English textbooks based on the perspective of legitimation code theory. Foreign Languages in China, 21(3), 77–86.

89.

Yang

Y. H.

Chu

H. C.

Tseng

W. T.

(2021). Text difficulty in extensive reading: Reading comprehension and reading motivation. Reading in a Foreign Language, 33(1), 78–102.

90.

Yeung

A. W. K.

Goto

T. K.

Leung

W. K.

(2018). Readability of the 100 most-cited neuroimaging papers assessed by common readability formulae. Frontiers in Human Neuroscience, 12, 308–308.

91.

Zamanian

Heydari

(2012). Readability of texts: State of the art. Theory and Practice in Language Studies, 2(1), 43–53.

92.

Zhang

(2021). Investigating problems in undergraduate English textbooks used in Chinese universities. Foreign Languages and Their Teaching, 1(1), 65–75.

93.

Zhang

(2014). A theoretical framework of English language teaching textbook evaluation. Foreign Languages Research, 1(11), 67–73.

94.

Zhang

(2022). Predictive effects of syntactic complexity on the quality of Chinese University students’ English expository writing. Modern Foreign Languages, 3(1), 331–343.

95.

Zhang

(2022). Revisiting the predictive power of traditional vs. fine-grained syntactic complexity indices for L2 writing quality: The case of two genres. Assessing Writing, 51, 1–14.

96.

Zheng

(2018). The multidimensional development of advanced learners' linguistic complexity. Foreign Language Teaching and Research, 2, 218–229.

97.

Zhu

Cao

(2020). Scaling descriptors for China’s standards of English language ability. Foreign Languages in China, 4, 15–22.