Abstract
Keywords
Last year, that is, 2019, marks the 19th year since the Ministry of Education (MOE, 2001) officially mandated the implementation of bilingual education in higher education institutions across the People’s Republic of China. Following this initiative, institutions of higher education began to offer courses in both Chinese and another foreign language (predominantly English), particularly in fields that are directly related to national development and internationalization (e.g., biotechnology, information technology, finance, and law; MOE, 2001). Since then, bilingual education in mainland China has received both favorable attention and critical scrutiny (X. Gao & Ren, 2019). Proponents endorse this educational policy on the basis that it could help to prepare a new generation that is bilingual, bicultural, and biliterate in various disciplines. Proponents also suggest that bilingualism will make Chinese graduates more competitive in the global market, which is expected to support an increase in national power (e.g., Y. Feng, 2009; D. Zheng & Dai, 2013; Zhu & Yu, 2010). Critics, however, have presented the compelling argument that access to opportunity and equity encoded in this educational reform will result in further divides in China, by accentuating the vertical structure of society (e.g., G. Hu, 2008; G, Hu et al., 2014).
Despite such controversy, this language provision has gained popularity and momentum and has made its way into higher education in China (Y. Feng, 2005) through the subsequent government policies that promote it (e.g., MOE, 2005, 2010). According to N. Yang and Zhang (2015), there have been 150 to 200 different courses taught in Chinese and English in a handful of highly ranked universities. An elite university, located in Beijing, offered 200 disciplinary bilingual courses, including some with 100% of the instruction in English (MOE, 2017).
The evolution and expansion of bilingual courses is a product of internationalization in which English plays a crucial role as the
As such, the present study arises from the need to address the aforementioned publication bias to contribute to our understanding of the complex nature of bilingual education in mainland China. This article aims to comprehensively and systematically review research that has been conducted over the past 19 years and published either in English or Chinese outlets on the subject of bilingual education (with both Chinese and English as the MOI) in Chinese higher education. To this end, we synthesized the academic discourse of
A Terminological Choice: Bilingual Education versus EMI
It is worth mentioning that this research uses the term
Similar to the use of CLIL, a fast-developing phenomenon in Europe (Lasagabaster, 2015) that has already spread to South America (Siqueira et al., 2018), and CBI, popular in North America (Tedick & Cammarata, 2012), the term EMI more frequently refers to the language used in teaching within Asia, Europe, Africa, and the Middle East (Macaro et al., 2018; J. Zhao & Dixon, 2017). Such linguistic and policy choices reflect the elite status of the English language, as well as the socioeconomic value of being able to speak English that has been upheld in these regions. Although it does not preclude the use of learners’ native language (L1) in practice, it signifies a fondness that downplays the importance of L1 instruction, which has demonstrated promise for young immigrants in North America (e.g., Cheung & Slavin, 2012). The fundamental theoretical proposition establishes that learning a second language (L2) can be facilitated when command of a native language reaches a certain threshold (see Cummins’ Common Underlying Proficiency [CUP] theory, 1976; 1979), at which point the linguistic and content knowledge in L1 can be transferred to an L2 (see Krashen’s second-language acquisition hypotheses, 1985). For Chinese students, such a preference may infer linguistic imperialism, discount the benefit of material in their L1, and make it even more challenging to reconcile the tension between the culture and value embedded in the Chinese language versus in a foreign language such as English (Kirkpatrick, 2014; J. Liu & Fang, 2017), thereby, defeating the purpose. Finally, these terminologies reflect an
Our choice of terminology for bilingual education is also congruent with the way it is defined by researchers. Among many conceptualizations and theorizations of bilingual education (Baker & Wright, 2017), we adopt the delineation of Lasagabaster’s (2015) bilingualism and Palfreyman and van der Walt’s (2017) biliteracy in our study. In addition, we regard bilingual education as the use of two languages as a shared MOI in the academic context of higher education, with the objective of promoting learners’ communicative competence (both orally and in written form) in discipline-specific knowledge.
Models of Bilingual Education in Chinese Higher Education
A traditional view of bilingual education models in China can be summarized in three ways: immersion, maintenance, and maintenance or infiltration (H. Xu, 2008). In the immersion program, which was adopted from the Canadian model, English is acquired in the process of content learning. Instructors use English as the instructional language most of the time, and textbooks and other learning materials are in English. In the transitional program, Chinese is used as the primary MOI at the initial stage and gradually transitions to English as its language of instruction. In a maintenance or infiltrative program, Chinese serves as the MOI for the majority of the time (e.g., 90%), but textbooks and materials are in English. Although the forms of bilingual education can vary significantly across different contexts, there is a general consensus that the ultimate goal of bilingual education in the Chinese context is to equip bilingual people with specialized knowledge in academic fields, so that they can use English to communicate with English-speaking specialists and professionals as needed (Y. Feng, 2005).
Reviews on Chinese–English Bilingual Education in Mainland China
After an extensive literature search, we identified six reviews of Chinese–English bilingual education in mainland China (see Table 1). Four (Fan, 2014; H. Xu, 2008; D. Zheng & Dai, 2013; Zhu & Yu, 2010) were published in Chinese outlets, and two (i.e., F. G. Fang, 2018; G. Hu, 2008) were published in English outlets. Five major themes can be drawn from this body of literature to inform our own review. First, G. Hu (2008) argued that the craze of bilingual education has perpetuated the inequalities of accessing education in Chinese society. His argument was reiterated by D. Zheng and Dai (2013) as most English–Chinese bilingual courses are implemented in top-tier universities. According to the official definition, there are 151 key universities included in “211” or “985” projects, both of which are initiatives of the Chinese government to promote higher education and world-class universities in the 21st century (MOE, 2008, 2011, 2013). G. Hu and his colleagues (2014) claimed that highly qualified bilingual instructors, with strong communicative English proficiency and overseas experience, were more likely to be attracted by elite universities that offered competitive recruitment packages with the privilege of central/local funding support, which is a significant contributor to disparities in economic, cultural, and social capitals.
Research Reviews on Bilingual Education in China.
CNKI = China National Knowledge Infrastructure.
Second, among the six reviews, only one (i.e., F. G. Fang, 2018) examined seven studies published in English. Drawing from these seven articles, Fang argued that further assessment of the benefits and cost of bilingual courses in Chinese higher education was essential. He also called for a contextualized policy that considers the landscape of multilingualism in China and provides language support and guidance to both students and instructors so as to evaluate the impact of bilingual education on students’ English language and disciplinary learning, and to unpack the future development of bilingual education in Chinese higher education. It is apparent that on this particular topic, more articles are published in Chinese than English. This observation is also confirmed by two reviews of bilingual education in Europe (Reljić et al., 2015) and worldwide (Macaro et al., 2018), in which the authors acknowledged not only the existence of a large number of studies written in Chinese and other non-English languages, but also a lack of manpower and resources to review these articles.
Third, there was a reported scarcity of empirical research. For example, D. Zheng and Dai (2013) explained that from 23 articles published between 2003 and 2012, only 17% were empirical, and only one was an experimental study and one a classroom observation. The percentage of data-driven research was even smaller in Fan’s (2014) count (i.e., 9%;
Fourth, except for F. G. Fang (2018), the other five reviews discussed various models of bilingual education and students’ attitudes and perceptions toward the focal program related to students’ English proficiency and bilingual instructors’ qualification. Furthermore, four Chinese articles, while highlighting challenges in research and practice, were unequivocally positioned in a supportive stance because bilingual education in the Chinese context is transitioning from a pedagogical/curricular alternative to a policy imposition. They called for an ongoing dialogue between English teaching at colleges and bilingual instruction, as well as interdisciplinary collaboration among research experts, English instructors, and content specialists so as to improve the quality of bilingual education and its ability to support learning (Fan, 2014; H. Xu, 2008; D. Zheng & Dai, 2013; Zhu & Yu, 2010).
Finally, Zhu and Yu (2010) highlighted that a lack of breadth and depth of bilingual education research has presented more problems than solutions. They pointed to the urgent need for scientific investigation through comparative approaches and repeated measure design with advanced statistical analyses that can generate rigorous evidence on the practice of bilingual education. Similarly, G. Hu’s (2008) narrative review scrutinizes K–12 bilingual education in China. He questioned the methodological rigor of program evaluations that were filled with only favorable findings. An imperative recommendation for future research is to attend to students’ affective domains, such as motivation, interest, learning anxiety, behavior, and efficacy, as part of the evaluation system (D. Zheng & Dai, 2013).
The Present Study
However, none of the aforementioned reviews are comprehensive in the breadth of the coverage period, types of journals, themes, and the language of publication. For example, H. Xu (2008), G. Hu (2008), D. Zheng and Dai (2013), and Fan (2014) limited their search to core periodicals in either higher education or foreign language education and, therefore, excluded discipline-specific journals where a larger number of relevant articles exist. G. Hu (2008) and Zhu and Yu (2010) targeted either earlier phases of education (K–12) or K–16 without an emphasis on higher education, where it is most responsive to MOE policies and where the operation of bilingual courses is an obligation rather than a choice. Another limitation of earlier reviews is the small number of articles (i.e.,
In this study, to address the above issue of publication bias, we synthesize research studies examining bilingual education in postsecondary education in mainland China over the past 19 years (2001–2019), through a systematic review of “the research literature using systematic and explicit accountable methods” (Gough et al., 2012, p. 261). We follow the major themes outlined in existing reviews, by considering
Approach for Systematic Synthesis
Inclusion and Exclusion Criteria
In this article, we employed a multiphase systematic approach to analyze data collected through a search of two sets of databases and other online sources, to comprehensively cover articles published in both English and Chinese. We adopted and adapted a process described by H. Cooper (2009) for conducting a systematic review and leveraged the characteristics of that process that were most relevant to our purposes. Furthermore, based on the work of I. D. Cooper and Crum (2013), which identified the central role of librarians in systematic reviews of health and medical science, we collaborated with two librarians that specialized in information management, one that was a native English speaker and one that was a native Chinese speaker. With their assistance and expertise, we developed the terms required for searching appropriate sources and managing articles, as well as for documenting the search, retrieval, and archival processes. We believe that such a practice is also beneficial and applicable in educational research, as it ensures that the pursuit is both exhaustive and reliable. Inclusion criteria for screening included the following:
Public institutions of higher education in mainland China;
Research designated/entitled/described as EMI, CLIL, CBI, or bilingual education;
Research published in peer-reviewed journals and book chapters;
Research in settings where English is used as the language of instruction.
In addition to the inclusion criteria, we also applied the following exclusion criteria for screening:
Nonaccredited/private institutions of higher education or K–12;
Research on English language teaching/EAP, or English for specific purposes (ESP; unless it focused on content learning);
Master’s theses and doctoral dissertations;
Other systematic reviews, meta-analyses, meta-syntheses, and best-evidence syntheses (unless used for this article’s literature review and discussion);
Research conducted in Hong Kong, Macao, and other specially administered Chinese-speaking regions outside of mainland China;
Ethnic minority language (Tibetan, Miao, Korean, etc.) as an MOI.
Rationale of the Inclusion and Exclusion Criteria
Given the purpose of this study, which is to investigate the academic discourse of postsecondary Chinese–English bilingual education, we established the aforementioned inclusion and exclusion criteria for several reasons. First, we excluded studies conducted outside mainland China (such as Hong Kong, Macao, and other special administration regions) because the educational policies in these areas differ from those of the mainland. For example, the Basic Law of the Hong Kong Special Administrative Region (1997) of the People’s Republic of China clearly states that On the basis of the previous educational system, the Government of the Hong Kong Special Administrative Region shall, on its own, formulate policies on the development and improvement of education, including policies regarding the educational system and its administration, the language of instruction, the allocation of funds, the examination system, the system of academic awards and the recognition of educational qualifications. (p. 43)
We also included research in settings where the language of instruction was English, instead of an ethnic minority language (such as Tibetan, Miao, Korean, etc.), and excluded nonaccredited/private institutions of higher education or K–12.
Moreover, to control for the quality of studies included in our review, we set “research published in peer-reviewed journals and book chapters” as the inclusion criterion and “Master’s theses and doctoral dissertations” as the exclusion criterion because dissertations and theses are defined as gray literature that have not been published in a traditional format (Adams et al., 2017). Moreover, such unpublished literature is often hard to locate through common searching protocol/strategies; therefore, it is more difficult to archive, analyze, synthesize, and integrate (Scherrer & Preckel, 2019). We included “research designated/entitled/described as EMI, CLIL or CBI, bilingual education” and excluded “research on English language teaching/English for academic purposes (EAP) or English for specific purposes (ESP) (unless it had a focus on content learning)” because EMI, CLIL, CBI, and bilingual education share a common interest in the learning outcomes of both subject content and language proficiency (Brown & Bradford, 2018), whereas EAP and ESP place emphasis on providing students with language skills to master subject content and are more often designed as language courses in ESL/EFL settings (Airey, 2016; W. Yang, 2016).
Three-Phase Review
Informed by the themes derived from existing reviews, this review entails three phases. Phase I involves records that met the inclusion and exclusion criteria; Phase II focuses on empirical studies; and Phase III is devoted to empirical research that involves a comparison group, in which we seek to gain insight into the effectiveness of bilingual education. Unique to this study is a loosening of the restriction on empirical research in Phase I, so as to establish an optimal boundary, instead of a conservative one, and better understand the actual practice that is prevalent in bilingual programs that respond to the government initiative. In addition, Chinese journals in foreign language research and higher education publish articles that are restricted in length. Such brief articles do not normally have the space to substantively elaborate on data exploration, as typical empirical studies do (Tierney & Kan, 2016). Instead, more common forms of inquiries include recounts, essays, narratives, argumentative pieces, and reviews that may also reflect the trend of research in bilingual education. This manifestation of local scholarly literacy is oftentimes neglected as a result of internationalization, and instead redirected toward a Western intellectual tradition (Alatas, 2006; Mok, 2007).
We present a flowchart in Figures 1 (Chinese sources) and 2 (English sources) to outline the decision-making process following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Moher et al., 2010). A four-member team participated in the review process. When there was disparity, reviewers discussed it until they reached a consensus. Agreement was established at 90% on the Chinese database (Kendall’s tau value above .75,

PRISMA flowchart on Chinese sources. Note that there were originally 2,413 records retrieved from CNKI, with one written in English with Chinese abstract. Therefore, we counted this article in the English database.

PRISMA flowchart on English sources.
Research Questions
We began the process of this review with the following questions, associated with the three phases of review. Each research question is mapped onto the five themes identified in the literature review that was presented earlier in this article:
What was the publication trend? (Phase I)
What is the type of institution and funding support where bilingual education is implemented?
Is there a difference between Chinese and English publications?
What was explored? (Phase II)
3. What are students’ attitudes/perceptions toward bilingual education, and what challenges do they perceive?
4. What form of bilingual education is most popular?
How was it studied? (Phase III)
5. What are the characteristics of the research design in comparative studies (per H. Cooper, 2009)?
Type of assignment
Baseline equivalence
Threats to internal validity (e.g., confounding, selection bias)
Type, language, and evidence of validity/reliability of the outcome measure
Chinese Database
The Chinese database was built from a search conducted from January 2001 to September 2019 using the Periodical Database in the China National Knowledge Infrastructure Net (CNKI,
English Database
The English database was built using a search from January 2001 through to September 2019 from popular online libraries, which can yield the most peer-reviewed academic journals in the social science field, such as Education Resources Information Center (ERIC), Academic Search Ultimate, Education Source, and Linguistics and Language Behavior Abstracts (ProQuest). In addition, we also conducted a search on Google Scholar, as well as in journals that are most likely to publish studies on this topic (e.g.,
Results and Discussion
In this section, we report the main results of the synthesis of Chinese and English publications so as to describe developments in the research of bilingual education in Chinese higher education. To clarify the organization of the presented data, we have merged the discussion with the findings by presenting them in the order of each review phase: (a)
Phase I: Trends in Publications
In this phase, we address Research Question 1, the types of universities and funding support, and Research Question 2, the epistemological disparity between Chinese and English publications. A total of 1,632 studies, including nonempirical research, were reviewed and synthesized below.
Types of universities and funding support (Q1)
Among the 647 studies that specified the university where research was conducted, less than a quarter (18%,

Percentage of funding sources by university type.
Discussion
D. Zheng and Dai (2013) reported that most of the English–Chinese bilingual courses were implemented in top-tier universities; however, according to our review, although these courses may have been initiated in more privileged institutions, they were expanded to lower tiered institutions with varying quality. As such, bilingual education has been embraced across the nation (A. Feng et al., 2017) and is not exclusively a service that is available to the elite. Furthermore, our data support the fact that research on bilingual education in colleges and universities has been funded by central, provincial, and local resources. In fact, a slightly larger proportion of financial support was distributed through local agencies, particularly to non-key universities that fall outside the top 100 national rankings. The well-balanced allocation of funds between key universities and their counterparts reflects the current practice of lower tier institutions allocating more resources for the implementation of bilingual programs. Such a trend also stands in contrast with the general critique that bilingual education perpetuates inequality in social capitals because only the elite can afford it (A. Feng et al., 2017; G. Hu, 2008; G. Hu et al., 2014), and supports Wong’s (2008) recommendation for a decentralized policy shift in terms of funding. Therefore, we contend that the issue of equal access to education should not be used as a reason to oppose bilingual education in Chinese higher education.
The epistemological disparity between Chinese and English publications (Q2)
As presented in Figure 4, over the 19-year span, there has been a growing volume of work being published in Chinese (

Total number of studies by year by language.

Percent of empirical studies by year by language.

Percent of studies by methodological approach and language.
Discussion
The second trend that we observed in the reviewed publications is the disparity between Chinese and English articles in terms of quantity, as well as their epistemological approach to researching bilingual education. Kirkpatrick (2011) expressed a concern that the focus on foreign language as an MOI is inevitably accompanied by the requirement of disseminating knowledge in that foreign language. However, his concern was not supported by our systematic review; except for a volume of work most recently published in English (see Zhao & Dixon’s edited book, 2017, and two studies in a special issue edited by A. Gao in the International Journal of Bilingual Education and Bilingualism, 2019), the number of English articles does not accommodate non-Chinese scholars’ increasing interest in, and demand for, an understanding of bilingual education in Chinese higher education. At the time when Baker (2007) stated, “much is known about bilingual education in North America and in Western Europe. The world knows very little about bilingual education in China” (p. vii), there were four English articles related to bilingual education (Figure 6). After a decade of research scholarship, a global systematic review only identified three studies in Chinese universities (see Macaro et al., 2018). We observed the same pattern in our own review, mainly due to the lack of dissemination of the English language outside the Chinese community (although there was a preponderance of literature published in Chinese).
The language of publication differs not only in quantity but also in the research paradigm. More specifically, there is a consistently reported paucity of empirical or data-based studies in Chinese (e.g., 7% in J. Liu et al., 2006; 7% in D. Zheng & Dai, 2013). This may be explained by the holistic and dialogical thinking of “me” in Chinese, compared with the Western tradition that detaches the authors to a third-person stance, and seeks the analytic investigation prevalent in Western academia (Tierney & Kan, 2016; Y. Zhao et al., 2008). We agree with Kirkpatrick’s (2011) recommendation that the dissemination of knowledge and scholarship should maintain a balance between local language and English, and that bilingual journals should be established in colleges and universities. At the very least, English abstracts should be searchable and made available to an international audience. To this end, our synthesis also increases awareness of the lack of Asian studies on this subject and addresses the criticism on the overrepresentation of American studies in major educational and psychological journals. Another observation worth mentioning is that the English articles included in this review were all authored by native English speakers or Chinese scholars who received doctoral or postdoctoral training in Western countries. The nature of bilingual education relies on research that can generate practical evidence and, therefore, identifies a need for more empirical studies with rigorous designs that can contribute to “a solid knowledge base for policymaking” (X. Gao & Wang, 2017, p. 228). We speculate that this epistemological form of inquiry will be realized as more Chinese scholars with overseas credentials and Western research dispositions return. As many higher institutions in China increase their efforts to recruit these scholars, as well as the internationalization of higher education more generally (Hughes, 2008; Tierney & Kan, 2016), we expect a considerable volume of work on the topic to appear in English outlets.
Phase II: What Was Explored
In this section, we examined 301 empirical studies to answer Research Question 3 “What are students’ attitudes/perception toward bilingual education, and what are their perceived challenges?” as well as Research Question 4 “What form of bilingual education is most popular?” A discussion follows each question.
Students’ attitudes/perceptions and perceived challenges in bilingual education (Q4)
Among all 301 data-driven studies, 123 (92 Chinese articles, 6%; 31 English articles, 52%) addressed the topic of Chinese college students’ attitudes toward bilingual education. Responses were expectedly and predominantly in favor of bilingual education; however, further investigation revealed a few important issues. First, students perceived learning a content area in English as challenging, due to the highly specialized vocabulary of the discipline (e.g., B. Peng, 2016; W. Yu et al., 2016). Second, students also expressed concerns that their English proficiency played a critical role in the success of learning their subject in English (e.g., G. Hu & Lei, 2014; J. Li & Zhang, 2016; X. Xiao et al., 2011), and there was a lack of opportunity to enhance communicative competency in English (W. Wang & Curdt-Christiansen, 2019). A third issue was associated with five studies (Bolton & Botha, 2015; L. Guo et al., 2016; G. Hu & Lei, 2014; Ouyang & Gao, 2016; P. Wang et al., 2016) that reported contradictory responses from (a) architecture and chemistry majors who expressed little interest in bilingual course content and (b) medical students who did not believe that participating in the course had facilitated learning academic English or improved their general English skills. These findings reveal that students’ attitudes were associated with the challenge of quality bilingual programs (i.e., students’ English proficiency and instructors’ qualifications), which takes us to the next section.
Students’ English proficiency
Among the 301 data-driven studies, 170 (128 Chinese articles, 8%; 42 English articles, 70%) addressed the topic of college students’ English proficiency. These studies suggested that students’ English proficiency was a determinant of the quality of bilingual education (e.g., L. Guo et al., 2011; N. Wang & Du, 2012; L. Yu & Han, 2011). In addition, this body of literature pointed to a disconnect between students’ general English proficiency and their academic English in the specific areas of content (J. Li & Zhang, 2016; T. Wang, 2015; X. Zhang et al., 2015), as well as a great variation among students’ English proficiency (Z. Wang, 2016). Some researchers reported that students had limited listening and oral skills in English (e.g., X. Chen, Lv, et al., 2016; W. Yang, 2016). Therefore, to benefit the most from bilingual courses, students were recommended to demonstrate an initial threshold (e.g., J. Han & Yu, 2007; L. Yu & Han, 2011), which was normally measured by two nationally standardized assessments (i.e., College English Test-Band 4 [CET-4] and College Entrance Exam-English test). CET-4 is mandatory for all non-English majors to test their general English ability in listening, speaking, reading comprehension, and writing (Y. Yang & Qian, 2017), and is usually taken in the second semester of the sophomore year (J. Xu & Fan, 2017). Students who pass CET-4 are considered to have mastered a sufficient amount of language (i.e., 4, 500 words and 700 phrases, MOE, 2004) to participate in a bilingual program (J. X. Han, 2009; J. Jiang, 2004; X. Li et al., 2009; Z. Wu et al., 2017). Regarding the College Entrance Exam, G. Hu et al. (2014) proposed a cutoff score of 120 (80%) as an eligibility criterion for program participation. Their proposal was echoed in an earlier empirical study in which 80% on CET 4 (or 60% on CET 6, an advanced English level) was associated with a solid linguistic repertoire that is deemed appropriate for learning content in English (J. Han & Yu, 2007).
Instructors’ English proficiency
Another finding derived from a survey of students’ perception concerns bilingual instructors’ qualifications, which was the main focus of almost all the English studies (
Discussion
Findings from this phase suggest that notwithstanding a large number of survey research, no detailed process was adopted by higher institutions in consulting and engaging their faculty and/or students, who are stakeholders that are directly affected by this educational movement. This finding echoes the current trend in bilingual education worldwide (Macaro et al., 2018). Moreover, despite the generally positive attitudes toward bilingual education identified among college students majoring in diverse disciplines, their increased awareness of the challenges associated with offering quality bilingual courses was consistently reported in previous reviews (e.g., F. G. Fang, 2018; D. Zheng & Dai, 2013; Zhu & Yue, 2010).
First, the shortage of qualified instructors has become a major roadblock for the successful continuation and expansion of bilingual education in Chinese universities (Cheng, 2017). Nevertheless, most of the empirical studies in our review were conducted through survey research, capturing students’ perspectives. Few studies directly addressed the best practices for improving pedagogy (e.g., overseas training in Cheng, 2017; training in the use of interactional/high-cognitive strategy, G. Hu & Duan, 2019) or provided a profile of the instructor’s professional background when the effect of bilingual education was examined (e.g., Tong & Shi, 2012). While teaching abroad may be a strong desire for bilingual teachers (Werther et al., 2014), the “effectiveness of a borrowed idea, practice or innovation depends crucially on its appropriateness for the specific, local, and dynamic reality of teaching and learning in a particular educational context” (G. Hu, 2009, p. 131). Therefore, an overseas training program is not the solution, but rather a first step in building up a support mechanism for bilingual instructors’ professional development, which is an ongoing process (Cheng, 2017; E. Zhou & Ding, 2012) that requires significant resources (Macaro et al., 2018). Equally important is the belief that for these instructors to be agents of change, they need to feel a sense of entitlement in this educational investment, rather than a passive role of participation. More evidence-based research can further such an understanding and reflect practice before a high-quality bilingual course is offered, for the purpose of maximizing student learning.
Second, in regard to students’ English proficiency, we raise concerns that were partly due to there being no clear definition of English proficiency. To be more specific, a large proportion of the studies cited a perceived improvement in students’ English proficiency as a great benefit of bilingual education (e.g., J. Li et al., 2016) without psychometrically sound instruments to measure such proficiency. Although very few studies showed a high passing rate of CET-4 among bilingual participants (e.g., Ma et al., 2016), there was no mention of the rate among nonbilingual participants, or their initial English levels prior to participation. In addition, little evidence regarding CET being indicative of higher academic achievement exists. As a result of these limitations, the bulk of the literature reviewed in this study failed to contribute to the discussion on the effectiveness of bilingual education. Instead, the literature speaks to a timely pursuit in defining and evaluating English proficiency so as to address the question of whether participation in bilingual education can truly improve students’ English competence, as proposed by Macaro et al. (2018) in their global systematic review. We want to remind the reader that scholarly attention should not only be allocated to English proficiency as a gatekeeper of bilingual education; what is more vital and beneficial, we argue, is to conduct research on how to provide English support and integrate English into curriculums, so that students can continue developing their academic English proficiency and thus be prepared for content taught in English.
On a different note, although not a specific focus of this review, we found that the vast majority of Chinese articles published with content area instructors being the lead authors (reflecting on their practices) included neither a coauthor that had expertise in second language acquisition and pedagogy nor one that was trained in research methodology. More than a decade later, the recommendation of H. Xu (2008) and D. Zheng and Dai (2013) that collaboration should occur not only between subject and language specialists but also between practitioners and researchers in bilingual education is yet to be realized.
Bilingual program models (Q4)
Among all empirical articles, only 40 (13%) mentioned the types of bilingual models that were implemented, with the majority being immersion (
Discussion
Findings regarding bilingual program models suggest that specifications of program models are far from adequate. Despite the popularity of the immersion model among the small percentage of studies that discussed models of implementation, S. Zhang (2015) strongly promoted the transitional bilingual model that takes into consideration the challenge of the authentic English language environment, instructors’ qualification, and instructional material. Some researchers suggested a combination of language of distribution, for example, having 30% in English (X. Liu et al., 2012; M. Lu & Ma, 2016). However, according to G. Hu’s (2008) criticism, these terminologies on program models (i.e., immersion, transitional) are misaligned with the international literature on bilingual education. This is because there are fundamental differences in sociocultural, educational, linguistic, political, economic, and historical contexts between China and the countries where these models originated (Q. Qu, 2015; Tong & Shi, 2012). Taking a transitional bilingual model as an example, the concept was imported from North America, where the language of instruction transitioned from a minority language to majority language; this contrasts the transition program in Chinese higher education, which aims to use Chinese (majority language) as a bridge to English (minority language), for the purpose of developing students’ English proficiency in an academic context (P. Wang, 2017). Due to such a distinction, the exclusive use of English became a disservice to bilingual/multilingual students, particularly in a context where much more information is available in L1. These activities are dangerous in that they contribute to a form of linguistic hegemony that can be disruptive to the ecology of a local language (F. G. Fang, 2018; Kirkpatrick, 2014; D. Li, 2013).
A substantial distinction has been uncovered between what a program is labeled as and what is actually practiced, not only in an English-speaking setting (Irby et al., 2007) but also in China (Y. Li, 2012) and other Eastern countries (Barnard & McLellan, 2014). Although our review points out such a distinction, research in this area is still scarce. Without more information on observed practices, program evaluation stands on no ground. Therefore, it is imperative to objectively capture pedagogical practices in bilingual classrooms (H. Guo et al., 2018; Tong, Luo, et al., 2017). We urge Chinese academics to purposefully reconsider a framework with appropriate designation or variation of forms of bilingual education that is analogous to terms widely known to English audiences; more importantly, such a framework ought to accommodate the needs of stakeholders (i.e., students and instructors) and fits into the nativized landscape of Chinese higher education.
Relatedly, H. Guo et al. (2018) reasoned that the yet-to-be proved effectiveness of bilingual education in China is due to “a lack of a commonly adopted, comprehensive evaluation framework that draws from, and is informed by, empirical evidence produced through quality research” (p. 13). We assert that a localized bilingual education theory with model specifications can significantly contribute to guiding data-driven research in Chinese higher education institutions.
Phase III: How Was It Studied
From a methodological perspective and the review in Phase I, we found that 18.7% of studies were data driven. It is also worth mentioning that 73% (
Coding Sheet of 34 Studies in Phase III Review.
CET = College English Test; RCT = randomized controlled trial; QED = quasi-experimental design; PBL = project-based learning.
RCT/QED that reported statistically positive effect in both English and content area (
RCT/QED that reported statistically positive effect in either English (
Random assignment and baseline equivalence
After careful scrutiny, we identified only 10 studies (nine in the field of medicine and one in business) that were randomized controlled trials (RCTs), and only three studies that had intact class as a unit of assignment (i.e., Y. He et al., 2018; Sha et al., 2014; Shi et al., 2016). The other seven randomly assigned students to either bilingual or monolingual instruction (i.e., L. He et al., 2016; A. Liu, 2019; Long et al., 2019; Mi, 2018; Xing et al., 2012; Yuan, 2016; X. Zhao et al., 2016). Yuan (2016), for instance, applied a block randomized design strategy based on students’ test scores on the content areas which were first divided into five categories from levels A to E. Within each category, students were then randomly assigned into bilingual or monolingual Chinese classes. In the remaining 24 articles, random assignment was either falsely claimed (i.e., Sun & Xiao, 2006; G. Zhang, 2012) or unclaimed (e.g., Z. Liu, Luo, & Han, 2012). For example, G. Zhang (2012) randomly selected one class (from a total of six) to receive bilingual instruction, and another class to receive Chinese-only instruction.
In addition to this, when random assignment did not occur, an examination of initial equivalence was required to ensure the comparability of the two groups from the outset (Campbell & Stanley, 2015). However, only three quasi-experimental designs (QEDs) reported the baseline of participants’ gender distribution, age, and English proficiency (i.e., J. X. Han, 2009; Lei & Hu, 2014; G. Zhang, 2012), in which J. X. Han’s (2009) study was quantitative with 274 participants (137 in treatment and 137 in control condition). The author conducted an independent
Evidence of validity/reliability and types of outcome measures
Among 34 comparative studies, there were three RCTs (i.e., L. He et al., 2016; A. Liu, 2019; Shi, Chen, et al., 2016) that measured participants’ outcomes in both English and content knowledge at the end of the program. Other outcomes included participants’ satisfaction (i.e., L. He et al., 2016; L. Zhang, 2016), anxiety, confidence, interest, learning initiative, memory (i.e., Zhan et al., 2016), interest (e.g., L. Chen et al., 2016; G. Zhang, 2012), and self-efficacy, motivation, and metacognition (i.e., Shi et al., 2016). These outcomes were all collected through self-reported instruments in the respective studies, in which the researchers failed to provide psychometric evidence.
Based on the previously mentioned outcomes in Table 2, a total of 20 studies demonstrated a significant difference in favor of bilingual programs in English, specific subjects, or affective domains, including eight RCTs and two QEDs that all came from the medical science field (e.g., anesthesiology, nephrology, and physiology in Chinese medicine), except for one that was in the field of math. A closer examination of Table 2 reveals that among the 10 methodologically sound studies of RCTs and QEDs (i.e., L. He et al., 2016; G. Zhang, 2012), two found statistically significant positive outcomes in both English and content knowledge. The other eight reported a positive effect of bilingual courses in terms of students’ performance either on their specific subject (
Finally, in studies that failed to detect a statistically significant difference in the content area, interpretations were formed from a contrasting perspective. J. X. Han’s (2009) QED concluded that bilingual instruction was equally as effective as Chinese-only instruction in supporting students’ academic achievement in mathematics. Lei and Hu (2014) concluded with an undetermined quality of the focal program, despite bilingual students’ overall satisfaction of the program. As was mentioned above, a serious weakness in Lei and Hu’s study is the initial inequivalence between the two groups of students, which raises questions about the comparability, and leads to an unfavorable conclusion of bilingual education.
Discussion
We now turn to a discussion on the comparative studies reviewed in this article. First, G. Hu and Li (2017) summarized that there was virtually no empirical investigation that involved a comparison group of monolingual Chinese instruction to address the effect of bilingual education on students’ English proficiency and academic outcome. Their concern was supported in our comprehensive review. Despite a substantial amount of work, research over the course of nearly two decades has only produced a total of 34 studies that compared bilingual education with a monolingual, Chinese-only approach. Furthermore, only 13 articles attempted a randomized technique at the student/class level or identified comparable counterparts, which are the most rigorous designs for testing causality (Campbell & Stanley, 2015). Unfortunately, our in-depth review of these studies revealed recurring methodological issues, such as nonrandom assignment, group incomparability, missing information, a lack of statistical control for baseline inequivalence on participants’ knowledge and skills in the subject, or a lack of information on its implementation. These flawed approaches compromised the nature of internal validity, one of the most critical elements in experimental design (as it is associated with random assignment and, thus, causality; Campbell & Stanley, 2015; Coleman, 2018), which consequently undermined the credibility of the findings.
Second, it is not surprising that the two most commonly examined outcomes were English language proficiency measured by CET, or other English tests and grades on content knowledge (measured by instructor-developed, nonstandardized instruments). However, problems still exist. For example, the studies that attend to both outcomes are scarce, which is problematic as the ultimate goal of the Chinese government is to prepare people with both a strong communicative ability in English and knowledge and skills in their respective subject area. The six studies that reached a certain level of consensus on the positive outcome in the medical science disciplines were overshadowed by abundant, nonempirical research, which corresponded to the conclusions of existing reviews presented earlier in this article (e.g., F. G. Fang, 2018; D. Zheng & Dai, 2013). What is more, none of the 34 studies reported any reliability (e.g., internal consistency) or validity (e.g., construct validity) indicators of the measures used for comparison. Although CET is nationally normed with strong psychometrics (College English Test Band 4 and Band 6, 2018), no information regarding the sample was presented in these studies. There is a common understanding that reliability and validity are critical psychometric features of an instrument, and their findings inform the professional community, as well as the policy-makers that make high-stake decisions (Gitomer et al., 2019). The lack of such information hinders meaningful interpretation of the results, rendering them inadequate and unconvincing. We agree with Lei and Hu’s (2014) recommendation that more discipline-specific measures of English should be developed and validated.
Third, after a long debate in the United States, the positive effect of bilingual education among young children has been documented in quality research and acknowledged by researchers and practitioners (Irby et al., 2010; Lindholm-Leary, 2016) through a well-controlled, randomized design with a high level of implementation fidelity. This is not the case when it comes to an inquiry into the effectiveness of bilingual education in China. Studies with cognitive/affective domains (such as self-identity, self-efficacy, learning anxiety, and learning motivation) are rarely conducted, which corresponds to the previous reviews by H. Xu (2008) and D. Zheng and Dai (2013). These constructs are expected to affect the quality of bilingual education (D. Zheng & Dai, 2013) and, thus, deserve comprehensive exploration.
These findings, however, not only resonate with previous reviews in the context of mainland China (i.e., Fan, 2014) and Hong Kong (Lo & Lo, 2014), but are also applicable to academic discourse of bilingual education worldwide (Macaro et al., 2018). The aforementioned issues have significantly hindered the establishment of a causal effect relationship that is typically derived from rigorous randomized controls to address the impact of bilingual education; this suggests a need for more scientific exploration before any research synthesis that involves statistical approaches (such as best-evidence and meta-analysis) can be undertaken to quantify the effectiveness of bilingual programs in Chinese higher education. Conducting experimental research in bilingual education is challenging (X. Gao & Wang, 2017; G. Hu & Lei, 2014); nevertheless, it is only through solid design and evaluation that research scholarship can be enriched with compelling evidence to address the ultimate question:
Recommendations and Conclusion
From the insight provided by this systematic review, we suggest that even after almost two decades of research and practice in bilingual education in Chinese higher education, there is still a dearth of strong evidence from a contextualized body of research that can attest to the effectiveness of bilingual education as a result of ideological and epistemological orientation, as well as a lack of rigorous research design and implementation. We believe that the distorted academic discourse elaborated in G. Hu’s (2008) study a decade ago, that was rife with misunderstanding, misrepresentation, and misinterpretation, is partly due to this. The well-intended and far-reaching policy provision of bilingual education has not resulted in significant and favorable conclusions. However, before such a definitive and convincing statement can be made, we insist that scholarly attention should continue to revolve around the quality and implementation of bilingual programs with the following recommendations, which are derived from our findings and supported by the existing body of literature:
An instruction/curriculum/evaluation team to be formed including a content specialist, language teacher, and researcher (Fan, 2014; H. Xu, 2008; D. Zheng & Dai, 2013; Zhu & Yu, 2010);
Examination of local policy and resource allocation that has the potential to shape educational practices;
A scientific program evaluation framework to be established and reinforced (H. Guo et al., 2018);
Observational research on the interplay of the distribution of two languages, content of each language, and instructor–student interaction (H. Guo et al., 2018; G. Hu & Duan, 2019; Macaro et al., 2018; W. Wang & Curdt-Christiansen, 2019);
Large-scale empirical/data-based research and longitudinal (more than 1 year) investigations that address linguistic, academic, and affective outcomes (such as self-identity, learning motivation, learning strategy, attitudes; D. Zheng & Dai, 2013);
Well-controlled experimental studies comparing bilingual course with L1-instructed courses, controlling for learners’ initial level of content area knowledge, as well as access to instructional and learning material outside the classroom (or socioeconomic status; X. Gao & Wang, 2017);
Inclusion of psychometrically sound instruments in survey research and standardized outcome measures, the result of which can be compared across studies (Bray et al., 2014; Macaro et al., 2018);
Dissemination in English to reach a broader audience of researchers and practitioners who are interested in bilingual education through an international lens that promotes scholarly exchange.
To conclude, to the best of our knowledge, this study is the first to systematically synthesize research studies (published in both English and Chinese) of postsecondary bilingual education and provides ample evidence of students’ learning in mainland China, which has 3.5 million English learners. Results from this review can, in turn, shed light on the debate of the educational benefits of bilingual education in mainland Chinese society, the scholarly community, and the government’s formulation of bilingual education policies in the coming decades. In addition, by scrutinizing the academic discourse in bilingual education, this study may provide policy implications and suggestions on how to put L2 learning and bilingual education programs into practice in a global context. Again, it is worth mentioning that this article is not intended to be viewed as an advocation for bilingual education; more rigorous research that can generate credible evidence speaking to the effectiveness of bilingual education may emerge elsewhere. Instead, this article responds to an urgent need to substantively highlight a picture of the scholarly trajectory to guide further research in this top-down educational movement that continues to increase its presence, momentum, and inevitability with unquestioned institutionalized policies and practices.
