Abstract
Introduction
Academic discourse study (ADS) is a nascent field within linguistics (Hyland, 2009); yet it has captivated scholars from a multitude of disciplines, including linguistics, education, economics, and business. Within linguistics, the study of academic discourse (AD) covers various subdisciplines ranging from philosophical pragmatics to English for academic purposes (EAP). According to Hyland (2009), this increased attention is partly attributable to the growing significance of ADS as it extends into institutional discourses around people. AD facilitates educators in conducting research and teaching, fostering social roles and relationships within scholarly circles, and cultivating an academic community dedicated to for tackling related research challenges. ADS does not focus on academic language alone, but rather on discourse because “discourse includes much more than language” (Gee, 1996, p. viii). Materials such as essays, research articles, lectures, dissertations, academic conference presentations, and textbooks are central to ADS, therefore, it subsumes various approaches and subareas.
During the development of the ADS, three primary approaches have been employed (Charles et al., 2009): discourse, corpus, and integrative. The discourse approach is traditionally considered as top-down (Swales, 2002), and concerns writing as a social practice (i.e., the social role of AD and the social context that it creates) (Bazerman, 1988; Berkenkotter & Huckin, 1995; Myers, 1990) and recurring patterns of meanings in texts of similar types (i.e., generic element analysis, move analysis, or genre analysis) (Brett, 1994; Holmes, 1997; Samraj, 2008). In contrast, the corpus approach is a bottom-up (Swales, 2002) technique that typically examines large amounts of data by checking the frequency and distribution of patterns or collocations of language use (Biber, 1988; J. Flowerdew, 2003; Groom, 2005). This approach covers numerous lexico-grammatical features ranging from personal pronouns to stance evaluations. The integrative approach is a combination of the discourse and corpus approaches, specifically, integrating corpus techniques into discourse analysis (or at the discourse level) (Bondi, 2008; L. Flowerdew, 1998; Sanderson, 2008). The development of ADS indicates both conflict and cooperation between top-down and bottom-up approaches, which Charles et al. (2009) regarded as a continuum from discourse-analytic to corpus-based approaches. Currently, three representative approaches are broadly employed in ADS, particularly in the EAP.
The ADS encompasses a wide range of topics. Scholars show interest in exploring identity issues in written academic (Englander, 2009; Hyland, 2012) and spoken genres (Biber, 2006; Zareva, 2013). Aditionally, they engage in cross-disciplinary studies (Hyland, 2005, 2007; Vazquez & Giner, 2008), revealing particular restrictions and conventions of communities in various disciplines. Scholars have focused on the influence of AD on other non-linguistic studies such as international relations (Afzaal, 2023; Laffey & Weldes, 1997; Milliken, 1999; Yee, 1996; Zhang et al., 2023), management (Afzaal et al., 2022; Ågerfalk, 2019; Haydon et al., 2021), and social problems (Loureiro & Conceicao, 2019; Vaughan, 2019). The ADS topics include a series of social scientific investigations. Currently, linguists focus on the role of language in conveying scientific knowledge or disseminating new findings within a particular research community influenced by a specific disciplinary culture (Farrokhi & Ashrafi, 2009), whereas scholars from non-linguistic fields tend to concentrate on the information or content of AD and how it reveals the impact of a particular AD professionally.
Traditional review articles of ADS often focus on specific topics such as academic identity (J. Flowerdew & Wang, 2015), language-related learning disabilities (Peterson et al., 2020), the influence of AD in education (Hu, 2008), and metadiscourse in academic writing (Khedri et al., 2013). These reviews, exploring both the theory and application of ADS, underscore its accomplishments across various research domains. However, the breadth of ADS has created a complex landscape for scholars aiming to grasp its developmental trajectory (e.g., its methodologies, seminal works and the spectrum of viewpoints discussed by scholars from diverse disciplines, particularly those new to the field of ADS. One complication of ADS is that influential research on ADS, including its research methods, on a disciplinary level instead of under a subtopic, remains difficult to determine. Although Charles et al. (2009) generalized three primary approaches to ADS, scholars adopted qualitative (Luzón, 2023; MacArthur & Alejo-González, 2024; Yasuda, 2023) and quantitative methods (Dontcheva-Navratilova, 2023; McGrath & Liardét, 2022; Tessuto, 2021) in one particular approach. Whether the two methods in one approach conflict with or complement each other has not been clearly stated, and whether these approaches share the same research purpose remains unclear. Another complication is that previous reviews have presented several conclusions on particular academic language phenomena; however, how to integrate those conclusions from a disciplinary perspective is still unclear. Additionally, whether published research papers on ADS can provide a hint of diachronic change in the course of ADS development and indicate a future direction for scholars or students keen on AD requires further discussion. To explore these complications, three research questions are proposed: (1) What have been the crucial and influential research in ADS over the past 20-year development? (2) What have been the major findings and divergences of opinions in ADS over the past 20-year development? (3) What does the 20-year ADS research indicate for its future development with respect to those major findings and divergences?
Theoretically, three types of reviews may help answer the above questions: (1) domain-based reviews, which synthesize research into distinct themes; (2) theory-based reviews, which concentrate on how research in the same group of literature applies theories; and (3) method-based analyses, which deal with the methodologies employed in the body of literature (Paul & Rialp, 2020). The answers to our research questions are aimed at revealing the disciplinary features of ADS with respect to its methods, conclusions, and divergence, and manifesting its changes in the 20-year development and future development. Considering the proposed research questions, a domain-based review is appropriate because it combines bibliometric and content analyses (Jiang et al., 2023) and the answers to our research questions require the results from big data analysis. Recently, bibliometric techniques have been applied to quantitatively evaluate research trends and tacit scientific knowledge in published academic articles using statistical, mathematical, and other measurement methods (Yu et al., 2017). Conventional expert-compiled reviews are typically based on interventions and prior knowledge of professional topics, whereas the bibliometric approach can “help analysts visualize and break down co-citation networks on the basis of the algorithm of co-citation matrix” (Fu et al., 2020, p. 2). Therefore, a bibliometric analysis can provide both a diverse range of relevant topics in the field of study and the information (or knowledge) flow between these topics, which ultimately helps us initiate in-depth discussions about the field and obtain insights into it (Chen et al., 2014).
Of the different softwares available for performing bibliometric analysis, CiteSpace is widely employed to detect knowledge foundations, emerging trends, and innovations in academic fields (Xu & Yu, 2019). It allows the visualization of networks of co-citation references based on published academic articles retrieved from different core collections (Chen, 2012). In addition, it provides the between centrality (BC) and the citation burst of co-citation and co-word analyses (Chen, 2016). Thus, this study adopted CiteSpace (v.5.8. R3) for data analysis. This study synthesized networks of co-citation references based on bibliographic records from the Web of Science (WOS) published between 2000 and 2021. The next section introduces the research method, including the essential thresholds for CiteSpace. Following that, the third section delves into the ADS through cluster and keyword analyses to address the first research question. Subsequently, the fourth section discusses the major findings and divergences of opinions in ADS over the past 20-year and its future development to answer the remaining two research questions based on the results in Section “Results,” followed by the conclusion.
Research Method
Bibliometric analysis is a computer-assisted review method that identifies core studies or authors and their interconnections by examining publications relevant to a specific research field or topic (DeBellis, 2009). In contrast to other academic databases, WOS maintains a higher level of rigor, with approximately 99.11% of its indexed journals indexed also being included in other widely available academic databases, such as Scopus (Singh et al., 2021). Therefore, the data used in this study were retrieved and downloaded from the WOS database, including the Science Citation Index Expended (SCIE), Social Science Citation Index (SSCI), Arts and Humanities Citation Index (A&HCI), and Conference Proceedings Citation Index (CPCI).
Data Source and Collection
Building upon the bibliometric analyses conducted by Han and Li (2021) and Wang et al. (2022), which used “discourse” and “text” as search terms, this study expanded the search parameters to include “discourse,” “text,” “genre,” and “publication” alongside the additional term “academic” (see Figure 1). The current study included the “research article” and “review” published in English between 2000 and 2021. The data were retrieved through “TOPIC” (including the title, abstract, and keywords) in WOS. Additionally, the study utilized the “remove duplicate (WOS)” feature in CiteSpace to refine the dataset. Finally, a total of 2,024 documents were collected for subsequent analysis, and data inclusion is illustrated in the PRISMS 2020 statement (Page et al., 2021) for systematic review in Figure 1.

PRISMA flow chart of data inclusion and exclusion.
Threshold in CiteSpace
CiteSpace is continuously developed to fulfill the visual analytic needs of scientific mapping tasks. It is designed to synthesize and visualize research in the form of a co-citation network with co-cited references visualized as nodes, as illustrated in Figure 2.

An example of a node.
A node’s citation history can be visualized through the use of different colors, from gray (e.g., 2000) to red (e.g., 2021), and through the thickness of its tree rings; the thicker a node’s tree ring, the more frequently a reference is cited. Moreover, these nodes can be categorized into various cultures based on the relativity and interconnectivity between references to a particular topic (Chen, 2006).
The essential thresholds used in this study are listed in Table 1, and the co-citation and co-word analyses in CiteSpace are based on these thresholds. The calculation was performed based on the following settings:
Essential Thresholds of the Co-Citation and Co-Word Analyses in CiteSpace.
Results
Figure 3 presents the annual counts of published articles and reviews. Apparently, both the polynomial (

Annual publication counts from 2000 to 2021 in the web of science core collection based on the search topics.
Cluster Analysis
The co-citation function in CiteSpace constructs networks of cited references and builds a network model. The network model is divided into (co-citation) clusters of references. Therefore, each cluster contains a theme, that is, the label of each cluster shared by the cited references. Cluster analysis depicts the details of the major clusters with high silhouette (over 0.9) and modularity (0.95) values as the thresholds illustrated in Table 1.
Based on the co-citation analysis, six major clusters that were labeled using Log-likelihood ratio (LLR) were generated, with a silhouette value of 0.956 and a modularity value of 0.949 (Figure 4). Table 2 lists the silhouette values of the six clusters, from the largest, #0 (wide audience), to the smallest, #23 (red herring). Although the six clusters were all calculated using CiteSpace, there were significant differences in their size values, that is, the number of keywords in the ADS shared by each cluster. Therefore, the size of each cluster clearly reveals the different statuses of the most cited references in each cluster. Thus, the most cited references in Cluster #23 were not as influential as those in Clusters #0 and #1.

Landscape view of the co-citation network in ADS from 2000 to 2021.
Summary of the Significant Clusters into Which the Co-Citation Network is Divided.
In Figure 5, the most cited references of ADS occur early in Clusters #3 and #4 from 2004, and move to Clusters #0 and #1 after 2013. This information flow is nearly consistent with the burst of ADS publications.

A timeline visualization of the co-citation network in ADS.
To discuss the theoretical basis for ADS, Table 3 lists the first three cited references in each of the six clusters, as they are generalized based on co-citation, in which the “frequently cited paper represents the key concepts, methods, or experiments in a field” (Small, 1973, pp. 265–266). Based on the timeline results, the frequency tended to decrease from Cluster #0 to Cluster #23, and most references were published after 2013. In addition, five were academic monographs (with gray backgrounds) rather than articles.
References with High Frequency in Each Cluster.
Cluster #0:
Cluster #1:
Cluster #3:
Cluster #4:
Cluster #16:
Cluster #23:
The presentation of each cluster clearly depicts the major influential research areas in ADS. It reflects the dominant position of English in research objects and the three approaches proposed by Charles et al. (2009). Synchronically, these six clusters were distributed in a relatively dispersed manner, as presented in Figure 4, suggesting that they did not have mutual connections. Diachronically, ADS began in academic English learning and subsequently propagated to discourse analysis. Furthermore, irrespective of whether the perspective is synchronic or diachronic, the primary task of ADS is to devote themselves to interpersonal relationships to facilitate writers’ (whether native or non-native English) communication of their opinions in accordance with the conventions in a particular discipline or academic community, while the discrepancy among scholars was the adoption of different perspectives for their studies.
Term and Keyword Analyses
Keyword analysis revealed the roots of ADS and possible topics in the branch of linguistics, whereas term analysis suggested more than the major terms adopted by ADS scholars. These two kinds of analyses were conducted to further investigate the relationships among the collected data. Figure 6 presents the two knowledge domains of the term and keyword networks from the references in the collected data. Both term (silhouette = 0.939, modularity = 0.884) and keyword (silhouette = 0.922, modularity = 0.840) visualizations revealed close connections between various terms or keywords. The keyword visualization contained only one large tree ring labeled

Landscape view of the keyword and term networks in ADS from 2000 to 2021.

A time zone visualization of the keyword and term analyses in ADS.
Figure 7 illustrates that ADS initially occurred in applied linguistics (i.e., language teaching and learning) and thereafter in the field of discourse analysis relating to social sciences. A combination of the two methods after 2010 and many smaller tree rings burst from 2010 to 2021. The time-zone visualization of the keyword analysis in Figure 7 is consistent with the tree rings demonstrated in Figure 6. The largest tree ring
To provide more details that may be helpful in understanding the future development of ADS, we listed the top 10 ranked terms and keywords by their BCs, as presented in Table 4. Most of the terms and keywords were consistent with those depicted in the cluster analysis and visualization of terms and keywords, such as
The Top Ten Ranked Terms and Keywords by BC.
According to Chen (2017), citation burst in CiteSpace is a computational technique for identifying references that attract increased attention in potential research and tracing the development of the focus of a study. Therefore, citation burst analysis should diachronically benefit the generalization of ADS development. Temporal changes in citation burstiness indicate the development of a focus of study (i.e., influential references from 2003 to 2019) and constitute a spiral escalation. Table 5 presents the four stages of the ADS development. The first stage was a theoretically driven epoch that drew on theories from social constructionism, pragmatic concepts (Hyland, 2000), genre analysis (Swales, 2004), and the social theory of learning (Wenger, 1998). The duration of citation burstiness in Table 5 suggests that from 2003 to 2012, the ADS preferred to focus on theories or models in the three references. From 2008 to 2010, ADS scholars appeared to focus on the interpersonal meaning conveyed by AD writers because the two studies were both concerned with such meaning in terms of personal pronouns (Harwood, 2005) and resources of stance and engagement (Hyland, 2005). The following stage of ADS manifested another research stream for non-native English scholars: teaching, learning (Mauranen, 2012; Zareva, 2013), and publishing (Lillis et al., 2010).
The Top 10 Ranked References by Beginning Year of Citation Burstness.
The last two studies shared similar topics with the previous three stages, however, from a different perspective. For instance, Hyland and Jiang (2016) focused on academic engagement through expressions such as, “we must conclude that. . .,” to probe the construction of interpersonal meaning in AD. However, their investigation presented an avenue for conducting diachronic ADS. Hyland (2017) interpreted the concept of metadiscourse from the perspective of ADS and explained how such an approach could be adopted to analyze academic writing. These two studies ostensibly discussed the topic or theory in the previous ADS; however, they began new perspectives on old themes, which is also the reason for this stage being a spiral escalation.
Discussion
Major Findings and Divergences of Opinions in ADS: A Synchronic Perspective
Term and keyword analyses have delineated the research contents of ADS in the past 20 years, and yielded three major findings. First, ADS is a topic-driven area that combines different types of analytical approaches. Visualization of the six clusters illustrates this concept based on the dispersed distribution of these clusters. Moreover, this finding explains why the citation counts or frequencies of references, terms, and keywords are smaller than those of other bibliometric analyses of linguistic research with similar numbers of collected references (Fu et al., 2020; Wang et al., 2022). Within the topic-driven ADS, the relationships between the topics were not close (see Figures 6 and 7) between 2020 and 2021. Moreover, this explains why previous reviews of ADS have often focused on specific topics instead of conducting a disciplinary analysis. Second, although the cluster analysis demonstrates a disperse distribution of ADS topics, there also exists a focus that connects most of the ADS, that is, the interpersonal meanings, which cannot be unfolded through traditional qualitative review of particular topics in ADS such as identity (J. Flowerdew & Wang, 2015) and academic writing (Khedri et al., 2013). Of the six major clusters, three (#0, #3, and #4) concern interpersonal meaning in relation to attitude (Hyland & Jiang, 2016), academic interaction (Zou & Hyland, 2019), stance (Morek, 2015), and hedging (Hu & Cao, 2011). These most-cited studies investigated interpersonal meaning from various perspectives and achieved the particular purpose of deconstructing communication between writers and readers of AD, thus benefiting teaching and learning for non-native English scholars. Based on the second finding, the third is evident: most of the ADS works fall within the scope of EAP. One possible reason is that the collected data were all written in English, although not all writers were native English speakers. Another reason is that the EAP reflects the dominant status of English in current international academic publications. In summary, the research context of ADS illustrates its generalities and discrepancies; moreover, it justifies the notion that academic language is a special variant derived from general language as a sublanguage (Harris, 1968, 1988).
Furthermore, cluster analysis reveals the major theories and models employed in ADS from a disciplinary perspective, including Swales’s (1990) genre analysis, Hyland’s metadiscourse (Hyland, 2004) and stance and engagement (Hyland, 2005) models, some pragmatic theories (Bu, 2014; Reershemius, 2012), and a systemic functional approach (Benelhadj, 2018; Coffin & Donohue, 2012). Another feature shared by the most-cited articles in the six clusters was that all the influential works devoted themselves to research descriptions and strategies. A popular paradigm in this trajectory is to propose a research question from teaching practice or actual situations of academic publications and develop a series of strategies based on a theory or model at the discourse level. This finding highlights the link between linguistics and education.
Alongside these major findings, two crucial divergences of opinions of carrying out ADS were revealed via cluster, term, and keyword analyses. The first divergence arises from the ADS methodology. Although Charles et al. (2009) indicated a tendency for cooperation between the qualitative (i.e., discourse approach) and quantitative (i.e., corpus approach) approaches, the former demonstrates a higher BC value in the term analysis in Table 4, which means that it is more significant for ADS, presently, and probably in the future. The quantitative approach to discourse analysis was once regarded as “a research technique for the objective, systematic, and quantitative description of manifest content of communication” (Berelson, 1952, p. 18). Compared with the quantitative approach, the qualitative method in discourse analysis works more like a method (Jäger, 2001; JØrgensen & Phillips, 2002) and a methodology (Goom, 2008; Powers, 2001). These labels imply a close association between instances of theoretical construction and empirical operations (Sayago, 2015). This finding supplements the review of ADS approaches by Charles et al. (2009). Although these two approaches (or methods) may cooperate, most studies prefer to employ only one of them. A qualitative approach to discourse analysis may have substantial potential for the theoretical development of ADS, which will ultimately promote its applied research. This is probably the reason why the qualitative approach has influenced ADS more methodologically than the quantitative approach over the past two decades and will continue to do so in the future.
Apart from the division between qualitative and quantitative methods, the introduction of the analyses of six clusters and citation burstiness, raise another divergence in ADS: description versus explanation. Most of the studies introduced above are descriptive, focusing on linguistic resources that realize a particular type of interpersonal meaning or strategy for communicating academic argumentation in the social sciences. This feature also occurs in previous reviews of ADS by Khedri et al. (2013) and J. Flowerdew and Wang (2015), which may further indicate that ADS is not a theory-based research area. A description delineates a subject or provides an account thereof, whereas an explanation provides a clear cause or reason (Novak, 1996) for it. These two primaries are distinct because “description tells us what is there, explanation why it is there” (Bergmann, 1957, p. 79). Therefore, explanations are theory-based and cannot be realized through descriptions (Reese, 1999). Theories influence description (Skinner, 1953), and “all scientific descriptions of facts are selective, [and] they always depend on theory” (Popper, 1966, p. 260). This tendency confirms that ADS is a topic-driven rather than a theory-based research area. However, the results of our bibliometric analysis did not indicate that there are no theories in ADS. Cluster analysis has displayed a series of theories, ranging from metadiscourse (Hyland, 2004) to a systemic functional approach (Benelhadj, 2018; Coffin & Donohue, 2012). The problem is that those theoretical discussions did not account for a large proportion of the publications on ADS and thus cannot have high BC values. From a disciplinary perspective, these results further indicate that ADS is concerned with the application of these theories in different research areas such as education and linguistics. The divergence between description and explanation has manifested a tendency for application in ADS over the past 20 years, and this orientation also resonates with other bibliometric analyses of particular subareas in ADS, such as Han and Li (2021) and Wang et al. (2022). Nevertheless, whether this tendency will be maintained for a long period of time requires further diachronic discussion.
Future Development of ADS: A Diachronic Perspective
To discuss the future development of ADS, it is unavoidable to decide the present stage it is in and the features of its current development stage, as well as the expectations from subsequent development stage. According to Shneider (2009), the evolution of scientific disciplines can be divided into four stages. In the first stage, scholars are occupied with a new subject matter in the realm of scientific analysis with the help of a new
Although no evidence for the occurrence of the third stage was found, some diachronic changes in the analysis results were observed, which may ultimately lead to new notions or insights into traditional ADS problems in its following stage. First, cluster analysis clearly demonstrated two trends in the ADS. One is the expanding scope of participants conducting academic research, which has grown from language resources (Morek, 2015; Zou & Hyland, 2019) and non-native English scholars (Hinkel, 2016; Mauranen, 2012) to publishers and reviewers (Ge, 2015; Hyland, 2016). Further, ADS is becoming increasingly more specifically comparative between disciplines or languages (Bruce, 2014; Hu & Cao, 2011) than the earlier general description of one language (Swales, 2004; Wenger, 1988). These conclusions are corroborated by the citation burstiness presented in Table 4 for terms such as
Second, the description and explanation methodologies of ADS may gradually be differentiated from each other through diachronic and synchronic analyses. According to Egré (2015), one explanatory method in linguistics is historical explanation; thus, the diachronic method in ADS can be treated as a kind of explanatory study. Although synchronic description is the mainstream paradigm in the collected references, a diachronic study by Hyland and Jiang (2016) has already been completed and has a citation burstiness from 2019 to 2021, indicating that several scholars in the field of ADS have focused on the diachronic paradigm in ADS. This may prompt them to diachronically examine other classical topics in ADS, ultimately leading to spiral escalation. Moreover, there is also possibly a tendency for cooperation between synchronic and diachronic methods, similar to the cooperation between qualitative and quantitative approaches. The terms may be distinct from each other, literally; however, “it is possible to have both diachronic emergence and synchronic emergence occurring in a single process” (Humphreys, 2016, p. 43).
The third change in the current ADS is the divergence between qualitative and quantitative methods. The timelines of development of the six major clusters clearly indicate that the basis of ADS initially lies in the construction or perfection of models and theories to fit the needs of ADS, followed by the burstiness of application of those models and theories using the corpus approach. This indicates a transition from qualitative to quantitative methods in discourse analysis (DA). This transition reflects Krippendorff’s (1980, p. 10) earlier recognition that “[p]erhaps the term DA no longer fits this larger context in which messages and symbolic and data are and must be understood” and the attempt of the quantitative approach, revealed by Sayago (2015, p. 3), “to extend its scope to the semantic and pragmatic relationships that link the text to the context, that is, to
Conclusion
Compared with previous reviews of ADS, this study reviews ADS bibliometrically from a disciplinary perspective and generalizes knowledge that cannot be obtained from traditional reviews. Generally, ADS is a relatively young discipline but has attracted much attention from more than one discipline. Over the past two decades, it has cultivated its own research paradigms, aided by quantitative methodologies. Findings reveal that currently, ADS is topic-driven rather than theory-driven, spanning a wide array of topics in linguistics, education, and publications. Many scholars prefer to conduct their study, particularly regarding the interpersonal meaning of AD. In addition, the analysis of influential research on popular topics further reveals some divergences of opinions in ADS. One is qualitative versus quantitative in its research method, and the other is description versus explanation of its research purpose. The results reveal that the present ADS is inclined to conduct quantitative research that describes the use of particular language resources but qualitative research is more influential. Finally, the analysis results suggest that ADS is at a second stage of development, where a group of disciplinary concepts and major techniques have been proposed and discussed; however, new insights into traditional issues have not occurred. Diachronically, the divergence between description versus explanation and qualitative versus quantitative exists. Description still differentiates from explanation methods in the research purposes of ADS and a transition from qualitative to quantitative method has occurred.
The above findings shed light on certain trends in the status quo and future development of ADS. Moreover, these findings indicate a deficiency in ADS. The current development of ADS lacks a theoretical investigation. As manifested in our results, metadiscourse analysis and the systemic functional approach are the most popular theories for ADS. Nevertheless, the two theories require improvement when employed in different types of discourse, as in Hyland’s (2017) introspection of the metadiscourse analysis approach with respect to its essence and future development.
