Sage Journals: Discover world-class research

Abstract

Keywords

Assessment Use Argument Chinese as a second/foreign language HSKK speaking test validation

I have seen a couple of international students that achieved good scores on the HSK level 5—the advanced-level Chinese proficiency test, and yet [they] can barely communicate at all in Chinese, not even daily conversation like “how was your weekend?” (A professor who teaches Chinese at a Confucius Institute in the USA, Interview, February 26, 2022)

Introduction

The Hanyu Shuiping Kaoshi (HSK) is a multi-level, multi-purpose Chinese proficiency test developed by the Center for Language Education and Cooperation (previously the Office of Chinese Language Council International and, henceforth, referred to by its colloquial name “Hanban”). It assesses reading, writing, and listening skills of second language (L2) Chinese learners wanting to study or work in China (Peng et al., 2020). Because the test contains no speaking skills section, however, educators and employers in China have complained that there was no way to evaluate test takers’ ability in this area based on HSK results (Jin, 2019). To address this concern, Hanban introduced a speaking test—Hanyu Shuiping Kouyu Kaoshi (HSKK)—to measure L2 Chinese learners’ general speaking proficiency as a complement to the HSK. Since the original one-level (i.e., advanced-level) HSKK version (used from 1990 to 2009) has been criticized for being impractical and difficult (Wang & Jiang, 2017), the new (and current) three-level HSKK version appeared in 2010 and has gradually attracted the attention of international Chinese learners. HSKK is presently conducted in about 120 countries and 860 cities worldwide, and more than 30,000 L2 Chinese learners take the test yearly (Ding et al., 2021). As the most influential speaking test for L2 Chinese learners, its results are of great importance for test users (Cui, 2010; Wang, 2014, 2018; Yang, 2017). The current version of HSKK (Hanban, 2010) is thus reviewed in this paper.

Test purpose

HSKK aims to promote spoken Chinese teaching and learning for L2 learners both domestically and internationally while offering an effective evaluation of test takers’ Chinese speaking proficiency to help various entities make score-based decisions (Hanban, 2010). These entities include: (1) Chinese higher education institutes (for making decisions regarding international students about admission, level of Chinese speaking ability for class assignment, and giving and waiving credits); (2) employers (for making decisions about recruitment, training, and promotion of highly skilled international workers); (3) L2 Chinese learners seeking to understand and enhance their Chinese speaking skills; and (4) Chinese language teaching institutes seeking to assess their teaching and training outcomes.

Levels

Hanban (2010) provides a basic description of the three difficulty levels of the HSKK (primary, intermediate, and advanced), which potential test takers can refer to when deciding which level to take. Specifically, primary-level test takers are those who have 6 months of Chinese learning experience (at a pace of 2–3 class hours per week or equivalent) and have mastered about 200 commonly used words. Intermediate-level test takers have 1–2 years of Chinese learning experience and have mastered about 900 of the most common Chinese words. The advanced level describes those test takers who have learned Chinese for more than 2 years and have mastered about 3000 commonly used words.

Length and administration

The test duration of the HSKK ranges from 17 minutes (primary level) to 24 minutes (advanced level). HSKK can be taken in two formats, that is, paper-based and computer-based, depending on test takers’ preference and the availability of test centers. Both formats require test takers to audio record their spoken answers; the test takers in the paper-based HSKK, however, are provided with audio recording equipment. In response to the global COVID-19 pandemic, some test centers began offering the computer-based HSKK remotely (Hanban, 2020). Test takers can access the site, www.chinesetest.cn , to check their results by using the test information given on the test ticket (30 working days after the test).

Author and publisher

HSKK is operated by Chinese Testing International Co., Ltd. (Website: www.chinesetest.cn , Email: kaoshi@chinesetest.cn ; Tel: + 86-10-59307657; Address: 83 Deshengmenwai Street, Xicheng District, Beijing, China), an independent legal entity owned by the Center for Language Education and Cooperation, a non-governmental public institution affiliated with the Ministry of Education (MoE) of China.

Price

HSKK fees can vary because of exchange rates. For those wanting to take the HSKK in Beijing or Shanghai, HSKK fees vary from level to level ranging from ¥200 (around US$30) for the primary-level HSKK to ¥400 (around US$60) for the advanced-level HSKK. HSKK fees for the remote test are 30% more. This surcharge is mainly due to the extra work carried out by the test center and invigilators. Specifically, the remote invigilators are required to check whether test takers’ physical environment meets the test requirements, that is, whether the test takers’ cameras capture body movements, and to verify the test takers’ ID documents before the test begins (Hanban, 2020).

Appraisal of the HSKK

The current review adopts Bachman and Palmer’s (2010) Assessment Use Argument (AUA), a framework that consists of a set of claims about a test taker’s performance (assessment tasks and assessment records) and is connected to the test use (i.e., interpretations, decisions, and consequences), linking all aspects of the HSKK and making a systematic appraisal of the trustworthiness of the test interpretations and the multi-purpose use of the HSKK. AUA is adopted because it is one of the few frameworks that includes an argument for assessment use and specifies the connections between test interpretation and the consequences of test use. The following section explains how well HSKK has performed based on the claims in AUA by comprehensively reviewing current empirical and validation studies and technical reports released by HSKK developers and administrators.

Assessment tasks

According to Bachman and Palmer (2010), assessment tasks are the input (e.g., a paragraph to read or a picture to describe), and questions (e.g., open-ended or closed questions) used in a test to elicit test takers’ responses. As the key element of AUA and the data for relevant claims related to test design and use, the assessment tasks and features of tasks, such as structures and authenticity, are worthy of analysis and attention.

Structures

HSKK assesses L2 Chinese learners’ speaking skills in three parts across three proficiency levels (see Table 1).

Table 1.

An overview of the HSKK tasks.

Task		Number of items	Duration (minutes)
HSKK-Primary
I	Listen and repeat	15	4
II	Listen and reply	10	3
Preparation		N/A	7
III	Answer questions	2	3
Total		27	17
HSKK-Intermediate
I	Listen and repeat	10	3
Preparation		N/A	10
II	Describe pictures	2	4
III	Answer questions	2	4
Total		14	21
HSKK-Advanced
I	Listen and retell	3	7
Preparation		N/A	10
II	Read a paragraph	1	2
III	Answer questions	2	5
Total		6	24

HSKK: Hanyu Shuiping Kouyu Kaoshi; N/A: not applicable.

Table 1 shows that the HSKK has six different task types with three tasks at each level. Part I in the primary-level HSKK is a “Listen and repeat” task requiring test takers to repeat 15 sentences precisely. The length and difficulty of the ten sentences at the intermediate-level “Listen and repeat” task increase, which makes the sentences more challenging to remember over a short period of time. At the advanced level, the “Listen and repeat” task is replaced with a “Listen and retell” task that requires test takers to accurately retell the essential information of three short paragraphs, which are usually selected from either narrative, argumentative, or descriptive texts. Sentences in each paragraph at the advanced level are longer and more complex with more information for test takers to remember.

Part II of the primary-level HSKK is a “Listen and reply” task that requires test takers to listen and answer ten questions. Three types of questions often appear in this task: general or Yes/No questions, special or wh-questions, and choice questions. At the intermediate- and advanced-level, the “Listen and reply” task is replaced with the “Describe pictures” and “Read a paragraph” tasks, respectively. Intermediate-level test takers are required to describe two stories coherently based on two pictures without being provided any voice or text materials, while advanced-level test takers are required to read an excerpt from prose, which mainly assesses their pronunciation, intonation, and recognition of Chinese characters.

Part III at all three levels is an “Answer questions” task requiring test takers to read two open-ended questions and use at least five sentences to answer each question. “Answer questions” tasks in the primary- and intermediate-level HSKK are annotated with pinyin (the official romanization system for Chinese characters in Mainland China) to help students read and understand Chinese characters in case they cannot recognize the characters. Common topics for the primary-level “Answer questions” task and the first question at the intermediate-level “Answer questions” task often require test takers to describe people, places, experiences, or habits. Two questions at the advanced-level “Answer questions” task and the second question at the intermediate-level “Answer questions” task usually require test takers to give or evaluate an opinion on a certain topic, describe things that involve a hypothetical situation, or explain differences and/or similarities between two things. Test takers at all three levels need to describe or narrate things or events using correct grammar and appropriate words. Intermediate- and advanced-level test takers also need to include advice and express their opinions accurately and logically. Some preparation time is given for the task at all three levels.

Task authenticity

Task authenticity is important when assessing speaking skills as it concerns the relationship between the speaking test and real-world contexts (Luoma, 2004). Hanban (2010) claims that the test developers have tried to incorporate authentic topics familiar to test takers into the HSKK. Nevertheless, the authenticity of the “Listen and repeat” tasks in the primary- and intermediate-level HSKK has been criticized (Jin, 2019). Studies evaluating past tests of HSKK have pointed to inauthentic sentences that are largely absent of context (see Jin, 2019; Wang, 2014, 2018). However, some researchers (e.g., Wang & Jiang, 2017; Yan et al., 2016) argued that second language learners usually draw on the spoken language used by their interlocutors and summarize, or even repeat, statements when preparing or giving a response within a conversation in oral communication. Accordingly, the test is valid to some extent because natural conversation and interaction depend in part on repetition, although the authenticity of the “Listen and repeat” task appears lacking compared with other tasks (e.g., “Listen and reply” and “Answer questions”). Jin (2019) also claimed that the texts selected in the advanced-level “Read a paragraph” task are excerpts from prose containing formal written language that reflects real-life oral communication to only a very limited extent.

Assessment records

The HSKK records the scores test takers achieve after completing the assessment tasks. In well-designed language tests, assessment records should be consistent across various tasks, forms, assessors, or times (Bachman & Palmer, 2010; Fan & Yan, 2020).

Scoring

In the HSKK, the maximum score that test takers can achieve at all three levels is 100, with “60” as the passing threshold (Hanban, 2010). While the test score itself has no expiration date, some higher education institutes in China may require an HSKK result no more than 2 years old. Test takers receive only a total score (e.g., 90 out of 100) without revealing the scores of the sections. This lack of transparency has led some test takers to request their individual task scores to see which parts can be improved (Ding et al., 2021). Furthermore, Hanban (2010) does not explain why the passing score is set at 60; curiously, there is no study analyzing how this number was decided. Similarly, more information about the setting, monitoring, and validation of the grading process should be provided by the test developer.

Regarding the scoring criteria for the HSKK, Hanban (2010) provides a task-specific rubric (Table 2); however, no information has been provided and no study appears to have validated the design of the rubric; thus, test takers do not know how scores are allocated to the three tasks at each level. Therefore, the test developer should consider providing a more detailed grading scale that totals 100 points.

Table 2.

Task-specific rubrics for HSKK.

Task		Ability descriptions
Task		High	Intermediate	Low
I	Listen and repeat(HSKK-Primary, HSKK-Intermediate)	The test taker repeated the sentence accurately	The test taker repeated the sentence incompletely	The test taker’s repeated sentence is different from the original sentence
I	Listen and retell(HSKK-Advanced)	The test taker retold the content completely and fluently, with few pauses, repetitions, or grammatical mistakes	The test taker retold part of the content, with some pauses, repetitions, or grammatical mistakes	The test taker’s retold content is different from the original content, with disordered language and cover less information
II	Listen and reply(HSKK-Primary)	The test taker answered the question accurately and succinctly with few pauses, repetitions, or grammatical mistakes	The test taker answered the question correctly, with some pauses, repetitions, or grammatical mistakes	The test taker answered the question unclearly
	Describe pictures(HSKK-Intermediate)	The test taker fluently described the picture with few pauses, repetitions, or grammatical mistakes	The test taker basically described the picture, with some pauses, repetitions, or grammatical mistakes	The test taker’s description is not related to the picture
	Read a paragraph(HSKK-Advanced)	The test taker read the paragraph fluently, with correct pronunciation and intonation,(few pauses, repetitions, or grammatical mistakes may exist)	The test taker read most part of the paragraph with (some pauses, repetitions, or grammatical mistakes may exist)	The test taker only read a few sentences
III	Answer questions(HSKK-Primary,HSKK-Intermediate,HSKK-Advanced)	The test taker answered the question fluently with much information(few pauses, repetitions, or grammatical mistakes may exist)	The test taker answered the question with little information(some pauses, repetitions, or grammatical mistakes may exist)	The test taker’s answer is not related to the question

HSKK: Hanyu Shuiping Kouyu Kaoshi.

Reliability

Regarding the reliability of the HSKK, an empirical study conducted by Cui (2010) examined the intermediate-level HSKK between 2008 and 2010. Cui (2010) invited three trained examiners to rate 51 tests and applied generalizability theory procedures to analyze the generalizability coefficient (reliability) of the whole test and every task type, and then applied Spearman’s rank correlation and a paired-samples t-test to analyze the relation between the “Describe pictures” and “Answer questions” tasks. The results suggested that the reliability (.87) was acceptable for the whole test, while the “Describe pictures” task had a higher reliability (.88) than “Answer questions” (.86), and combining scores on the different tasks into a composite score was reasonable.

A more recent study conducted by Ding et al. (2021) investigated the consistency of the test scores at all three levels of the HSKK between the test formats (i.e., paper-based or internet-based) and areas (i.e., whether taken in/outside of China) using an independent-samples t-test. The effect sizes of the primary- (Cohen’s d = .37), intermediate- (Cohen’s d = .18), and advanced-level (Cohen’s d = .15) HSKK were small, indicating that the format had a negligible impact on scoring consistency. However, when examining the area factor on the scoring consistency, they found that the effect sizes of the primary- (Cohen’s d = .47) and intermediate-level HSKK (Cohen’s d = .68) were close to or above the medium range. Those taking the primary- and intermediate-level HSKK inside China achieved higher average scores than those outside China. Ding et al. (2021) speculated that the difference was not caused by the location of the test but by the students’ language environment, that is, the test takers in China were more exposed to the target language environment. Studies have shown that the target language environment can provide more language input, and communication opportunities positively developing learners’ linguistic competence (Collentine & Freed, 2004). Another possible explanation is that the language learning motivation of test takers improved after living in China (Ding, 2015). However, the effect size comparing scores of the advanced-level HSKK (Cohen’s d = .15) was small, indicating that the area/environment factors had little influence over the advanced-level test takers.

In sum, study findings suggest the HSKK has good consistency among question items, the rubric, and formats; however, scoring consistency regarding other factors such as region and gender should be further explored.

Interpretations

A general claim in language testing and assessment is that “the interpretations of the ability assessed on a test should be meaningful, impartial, generalizable, relevant, and sufficient” (Yao & Wallace, 2021, p. 1). Validity, then, relates to whether the test interpretations are meaningful and significant (Fan & Yan, 2020; Knoch & Chapelle, 2018). This section reviews a key element in AUA—the construct validity of HSKK, which pertains to the validity of the interpretations drawn from the assessment records (Bachman & Palmer, 2010), and provides the ability descriptions and vocabulary requirements of HSKK to interpret test takers’ Chinese speaking skills when passing a certain proficiency level.

Construct validity

Regarding the construct validity of the interpretations derived from the HSKK assessment records, Hanban (2010) claimed that the key construct the test measures is general Chinese speaking proficiency. Specifically, the primary-level HSKK assesses the ability to comprehend and use everyday language and fulfill the demands of various daily tasks. The intermediate-level HSKK assesses test takers’ ability to understand intermediate Chinese and communicate effectively with L1 Chinese speakers. The advanced-level HSKK measures test takers’ ability to comprehensively understand oral Chinese and present themselves eloquently with advanced and abundant Chinese expressions.

A study conducted by Jin (2019) systematically examined the construct validity of the HSKK. Item discrimination of the six different task types was analyzed based on descriptive statistics and complexity, accuracy, and fluency measures (three standard linguistic indicators frequently used to distinguish test takers’ speaking proficiency levels) (Fan & Yan, 2020). Jin (2019) analyzed 40 test takers’ audio records and found that “Listen and repeat” tasks in the primary- and intermediate-level HSKK and the “Read a paragraph” task in the advanced-level HSKK may fail to distinguish test takers’ Chinese speaking proficiency. Jin’s (2019) findings also aligned with Wang’s (2020). The remaining four tasks were found to appropriately assess the learners’ speaking proficiency, however. To address this deficiency, the primary- and intermediate-level “Listen and repeat” task design should be improved to include more frequently spoken Chinese words and sentence structures. Future validation studies can be conducted on parallel forms of the “Listen and repeat” tasks. The “Read a paragraph” task in the advanced-level HSKK should also be revised to include more domain-specific and formal expressions in oral communication instead of using complex written language that lacks authenticity (as discussed in the “Task authenticity” section). Doing so would result in a more authentic and valid three-level HSKK.

One important component of the speaking construct—interactional competence—is underrepresented in the current semi-direct HSKK (i.e., human-machine/paper) which threatens decisions and conclusions based on scores (see Roever & Dai, 2021). In contrast, a direct test, which has the candidate speaking in real time with a trained examiner is one of the most common modes for assessing oral proficiency (e.g., International English Language Testing System Speaking test, Cambridge English A1–C2 tests, and Business English Certifications) These tests may attempt to mimic a real-life setting and actions as closely as possible while measuring the test takers’ oral language ability (and possibly interactional competence) (Qian, 2009; Roever & Dai, 2021). Therefore, test developers should consider building interactional competence into the HSKK by developing the direct testing mode. Inevitably, however, having only the direct testing mode would significantly increase the cost and extra work carried out by the test center entailing recruitment, training, and management of HSKK examiners, and the cost of the face-to-face HSKK would also need to be raised accordingly.

In sum, few validation studies have examined the construct validity of the HSKK; thus, more studies carried out on different Chinese learning contexts and speaking proficiency levels are needed in this area.

Levels and abilities

Hanban (2010) argued that the HSKK score is criterion-referenced. The three levels of HSKK have been aligned with several internationally recognized standards, such as the Chinese Language Proficiency Scales (CLPS), the American Council on the Teaching of Foreign Languages (ACTFL), and the Common European Framework of Reference for Languages (CEFR) (see Table 3).

Table 3.

Mapping of the HSKK levels.

HSKK level	HSK level	CLPS level	CEFR level			ACTFL level
HSKK-Primary	HSK 1	I	Basic	A1		Novice High
HSKK-Primary	HSK 2	II	Basic	A2		Intermediate Low
HSKK-Intermediate	HSK 3	III	Independent	B1	B1.1	Intermediate Mid
	HSK 3	III		B1	B1.2	Intermediate High
	HSK 4	IV		B2	B2.1	Advanced Low
	HSK 4	IV		B2	B2.2	Advanced Mid
HSKK-Advanced	HSK 5	V	Proficient	C1		Advanced High
HSKK-Advanced	HSK 6	V	Proficient	C2		Superior

ACTFL: American Council on the Teaching of Foreign Languages; CEFR: Common European Framework of Reference for Languages; CLPS: Chinese Language Proficiency Scales; HSK: Hanyu Shuiping Kaoshi; HSKK: Hanyu Shuiping Kouyu Kaoshi.

According to criterion-referenced interpretations, if an HSKK test taker achieves 60 and above, the suggested interpretation is that the test taker has reached a minimum standard based on the ability descriptions, and test takers who score less than 60 have not. However, some researchers (e.g., Ding, 2015; Jin, 2019; Teng, 2017; Wang & Jiang, 2017) have expressed doubt about measuring against other standards; that is, the test takers’ scores on the HSKK may not be an accurate indicator of their speaking skills rated by other internationally recognized standards. For example, as mentioned in the “Structures” section, the text in the “Read a paragraph” task of the advanced-level HSKK that lacks authenticity is taken from prose that contains formal written language with no conversation or interaction involved (Jin, 2019). In contrast, in the corresponding CEFR proficient level (C1/C2), a critical criterion of spoken language use, that is, test takers’ oral interaction with the examiner, is included and carefully evaluated according to the guidelines. Specifically, test takers at the C2 level are required to take part effortlessly in any conversation, be fairly familiar with idiomatic expressions and colloquialisms, and backtrack and restructure speech whenever conversation difficulties are encountered (Council of Europe, 2020).

Another concern about measuring against other standards is the limited vocabulary size required in the HSKK, which can lead to misinterpretations of HSKK scores. The vocabulary size requirements of HSKK/HSK and other Chinese proficiency standards are compared in Table 4 based on the CEFR levels (basic-A1/A2, independent-B1/B2, proficient-C1/C2) they are supposedly aligned with (Hanban, 2010). For the primary-level HSKK, 200 words is the minimum criterion, which is far from the 2100-word requirement for primary-level L2 Chinese speakers in the Spoken Chinese Proficiency Grading Standards and Testing Guideline (Ministry of Education [MoE], 2011) and the 2245-word requirement in the newly launched Chinese Proficiency Grading Standards for International Chinese Education (Ministry of Education [MoE], 2021) developed by the MoE. Some test takers have claimed that taking the primary-level HSKK has little value because it requires them to master only 200 words, meaning it is simpler to skip the primary level and go straight to the intermediate level (Ding et al., 2021). This mismatch of vocabulary requirements can also be found in the intermediate- and advanced-level HSKK. Thus, the HSKK needs to be better aligned with other speaking proficiency standards, such as ACTFL, CEFR, CLPS, and the Chinese Proficiency Grading Standards for L2 Chinese learners.

Table 4.

Comparison of the vocabulary size among HSKK/HSK and Chinese proficiency grading standards.

HSKK vocabulary	HSK vocabulary	CEFR level		Spoken Chinese Proficiency Grading Standards and Testing Guideline (MoE, 2011) vocabulary	Chinese Proficiency Grading Standards for International Chinese Education (MoE, 2021) vocabulary
200 (HSKK-Primary)	150 (HSK 1)	Basic	A1	2100 (Primary)	2245 (Primary)
200 (HSKK-Primary)	300 (HSK 2)	Basic	A2	2100 (Primary)	2245 (Primary)
900 (HSKK-Intermediate)	600 (HSK 3)	Independent	B1	5200 (Intermediate)	5456 (Intermediate)
900 (HSKK-Intermediate)	1200 (HSK 4)	Independent	B2	5200 (Intermediate)	5456 (Intermediate)
3000 (HSKK-Advanced)	2500 (HSK 5)	Proficient	C1	8300 (Advanced)	11,092 (Advanced)
3000 (HSKK-Advanced)	5000 (HSK 6)	Proficient	C2	8300 (Advanced)	11,092 (Advanced)

CEFR: Common European Framework of Reference for Languages; HSK: Hanyu Shuiping Kaoshi; HSKK: Hanyu Shuiping Kouyu Kaoshi; MoE: Ministry of Education.

Decisions

According to AUA principles (Bachman & Palmer, 2010), score-based decisions, which presume and build on sound score-based interpretations, can be made by considering the existing values in the community and relevant legal requirements. A general claim is that test scores and other test-related information allow for relevant, helpful, and sufficient decision-making to test users without any adverse consequences due to the assessment process. Hanban (2010) indicated that the HSKK has been specially developed to assess L2 Chinese learners’ general speaking skills to inform and support the score-based decision-making needs of higher education institutes and employers, L2 Chinese learners, and Chinese training institutes.

Regarding decision-making on using the HSKK in academic contexts, in 2018, the MoE specified that the HSK and HSKK scores are recognized as language requirements for admission to Chinese higher education institutes. Specifically, undergraduate and graduate students enrolled in Chinese-taught programs must achieve CLPS Level V (equivalent to advanced-level HSKK or HSK Level 5) before completing their second undergraduate year or before graduation (for graduate students). Some higher education institutes in China also admit international students to English Medium Instruction (EMI) programs. EMI students must achieve CLPS Level III (intermediate-level HSKK or HSK Level 3) before graduation, and EMI medicine majors must achieve CLPS Level IV (intermediate-level HSKK or HSK Level 4) before the practicum. Although the MoE documents suggest using the HSKK results in higher education settings because speaking skills are important for students to live and study in a second language environment, most students choose to satisfy the language requirement by taking the HSK since it has been set as a compulsory proficiency requirement for admission by most higher education institutes in China (Wang, 2018). HSK Level 4 is now becoming a globally acknowledged proficiency test for international Chinese learners (Ding et al., 2021; Wang, 2018).

Nevertheless, for L2 students who wish to pursue a government-funded Chinese-related program (e.g., Chinese language education, Chinese literature, Chinese history, and Chinese philosophy), the Confucius Institute Headquarters’ document states that the HSKK is a compulsory component of the application (Ding et al., 2021). Unfortunately, apart from being used to apply for a few government scholarships, scores on the HSKK do not appear to have wide public credibility, and some Chinese learners do not seem aware of the test (Wang, 2014; Yuan, 2017) perhaps because the HSK is a more comprehensive proficiency exam that assesses three skills (i.e., writing, reading, and listening), and is widely recognized by most higher education institutes in China (Wang, 2018). Thus, higher education institutes in China should consider setting HSKK as a reference or compulsory test before admitting and funding international students given that the HSK and HSKK measure different skills, and the HSKK provides an official reference for gauging international students’ general speaking proficiency (Ding et al., 2021; Wang, 2014).

As for employers, training institutes, and learners, whether and how to use the HSKK for specific decision-making purposes largely depends on the needs of the company, language center, and learner. Unfortunately, few studies have collected and analyzed test users’ needs and how scores are used for making decisions. Thus, eliciting test users’ perceptions, needs, and decision-making processes using HSKK scores is an area for future research.

Consequences

Using the AUA can provide test users with a rich lens to understanding both intended and unintended consequences (Bachman & Palmer, 2010).

There are two fundamental purposes of the HSKK, which are listed under “test purpose” in the introduction section of the review. The first purpose is to promote spoken Chinese teaching and learning domestically and internationally, although these purposes have yet to be well achieved (Ding et al., 2021). Among 437,331 test takers who took the HSK and HSKK in 2018, only 30,407 took the HSKK, accounting for less than 7%. Compared with HSK, which has a long history of research and development, HSKK is still in its developmental stages (Wang, 2014, 2018) The relatively small number of HSKK test takers may be attributed to its low public credibility; that is, many Chinese learners are not even aware of its existence (Wang, 2014; Yuan, 2017). Meanwhile, the popularity of the HSKK has witnessed an increase in the number of test takers both inside and outside of China (Ding et al., 2021). This growth, however, may have been tempered by the above-discussed concern about the HSKK’s low vocabulary requirement.

The second purpose of the HSKK is for test users to make various score-based decisions concerning studying and working in China and learning and teaching the Chinese language. These score-based decisions have attracted test users’ attention in the past decade. However, regarding the recruitment of international students, Wang (2018) claimed that the HSKK had become a less important measure in China. Nevertheless, international students applying for Confucius Institute scholarships to study in Chinese-related programs must supply an HSKK score (Ding et al., 2021). Some educational institutes in China also recommend using both the HSK and the HSKK as a comprehensive record of students’ listening, reading, writing, and speaking skills (Ding et al., 2021). However, studies have also questioned the appropriateness of using HSKK in academic contexts owing to its focus on general speaking proficiency rather than academically oriented Chinese (Hanban, 2010). Notably, several studies (e.g., Ding et al., 2021; Peng & Yan, 2019; Wang & Jiang, 2017) have revealed that international students face difficulties learning Chinese for academic purposes, even those who have passed the advanced-level HSKK. For example, international students have trouble using Chinese academic words (Peng & Yan, 2019) and appropriate Chinese to present their research findings at academic conferences (Ding et al., 2021; Wang & Jiang, 2017). Thus, because the HSKK scores concern only general speaking skills, a separate academic Chinese speaking proficiency test focusing on spoken academic Chinese for L2 students who want to pursue higher education in China appears to be needed.

Regarding the use of HSKK scores for hiring decisions by Chinese companies, Wang (2018) conducted an exploratory study and found that most companies require neither the HSK nor the HSKK; the HSKK does not appear to have public credibility even though Hanban (2010) argued that one of the test’s purposes is to help companies assess international workers’ speaking proficiency. Wang (2018) claimed that some companies in China prefer to conduct internal interviews to assess prospective international worker’s Chinese vocabulary and communication ability in a business context because they do not have confidence that the HSKK, as a general Chinese speaking test, can accurately reflect international workers’ Chinese ability in a business setting. Another test, the Business Chinese Test (BCT) (Oral iBT), may be a better indicator of spoken language competence than the HSKK for employers. According to the BCT administrator’s brief description (see www.chinesetest.cn ), test takers who pass the advanced-level BCT (Oral iBT) can fully understand and comprehensively use Chinese in authentic and diverse business contexts. However, a better articulation of the different uses of HSKK and BCT (Oral iBT) is needed to help test users understand the differences between the two speaking tests’ purposes in general and professional contexts.

As for using the HSKK to improve the learning and teaching of Chinese speaking skills, Wang, (2014, 2018) who examined the washback effects of the HSKK in China, found that it did not seem to have an impact on Chinese teachers’ language teaching beliefs or their teaching, but it could motivate international students to practice their spoken Chinese and pursue higher education in China. Thus, Wang (2018) concluded that the washback effect of the HSKK on Chinese speaking skills dovetails with Hanban’s (2010) intended consequences. However, it is uncertain whether improved teaching of Chinese has been achieved as only a few studies have been conducted on the washback effects of the HSKK, and these have been conducted only at the college level in China. Thus, future investigations should be conducted to examine the HSKK’s washback effect across all levels and learning contexts.

Conclusion

By adopting Bachman and Palmer’s (2010) AUA framework, this review has evaluated the current HSKK revealing four key points: (1) the HSKK reliably assesses L2 Chinese learners’ general speaking skills; (2) the effort to align the HSKK with other internationally recognized standards such as CEFR and ACTFL makes it easier, in theory, to be interpreted by the global language testing and assessment community; (3) some testing purposes of the HSKK (decision-making needs) have been achieved; and (4) several concerns have been raised about the construct validity, specifically the low vocabulary size requirement, the misalignment between the HSKK ability descriptions and other grading guidelines (e.g., CEFR and the Chinese Proficiency Grading Standards), the lack of interactional competence being assessed, and two inauthentic task designs (i.e., the “Listen and repeat” tasks in the primary- and intermediate-level HSKK and the “Read a paragraph” task in the advanced-level HSKK) that may not successfully assess test takers’ Chinese speaking proficiency.

From the test users’ perspective, using the HSKK in various learning and teaching contexts is questionable due to the test’s focus on general speaking skills. To increase the public credibility of the HSKK, Chinese language test developers and administrators should consider: (1) using more authentic and valid materials; (2) providing more test-related information (e.g., individual task scores, a detailed task-specific scoring rubric, reasons for the task design, and a pass-fail score setting); (3) investigating the washback effects on learning and teaching Chinese speaking skills over time and across factors such as regions and gender; (4) developing a separate module for assessing spoken Chinese skills specifically in the academic context; (5) explaining the different uses of HSKK and BCT (Oral iBT) carefully to the test users; and (6) compiling new Chinese speaking vocabulary wordlists that match the minimum word acquisition levels required by the MoE’s Chinese proficiency grading standards. As the HSKK is currently being revised to match the newly issued Chinese Proficiency Grading Standards for International Chinese Education (2021), current concerns of test users may be addressed, which should increase the public credibility of the HSKK.

Footnotes

The author thanks the Test Review Editor,Ute Knoch,who offered invaluable feedback that facilitated the author’s revisions. The author is grateful to three anonymous reviewers,Ling Shi,and Paul Stapleton,whose constructive comments greatly improved the review. The author also thanks Linjing Yang of the Chinese Testing International Co.,Ltd.,for providing up-to-date information about the HSKK and checking the factual accuracy of the review on behalf of the HSKK developer.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author received no financial support for the research,authorship,and/or publication of this article.

ORCID iD

Albert W. Li

References

Bachman

L. F.

Palmer

A. S.

(2010). Language assessment in practice. Oxford University Press.

Collentine

Freed

B. F.

(2004). Learning context and its effects on second language acquisition. Studies in Second Language Acquisition, 26(2), 153–171. https://doi.org/10.1017/S0272263104262015

Council of Europe. (2020). Common European Framework of Reference (CEFR) for languages: Learning, teaching, assessment—Companion volume.

Cui

(2010). A study on HSKK with generalizability theory (Unpublished Master’s thesis). Beijing Language and Culture University, Beijing, China.

Ding

(2015). An analysis of motivation change of motivated Chinese language learners in the target language context. Yuyan Wenzi Yingyong (Applied Linguistics), 2, 116–124.

Ding

Cheng

Ding

Chen

(2021). The present situation and problems of global Chinese learners’ oral performance: Based on big data analysis of HSKK. Language Teaching and Linguistic Studies, 212(6), 13–23.

Fan

Yan

(2020). Assessing speaking proficiency: A narrative review of speaking assessment research within the argument-based validation framework. Frontiers in Psychology, 11, 330–330. https://doi.org/10.3389/fpsyg.2020.00330

Hanban. (2010). The new HSK (and HSKK) test syllabus. The Commercial Press.

Hanban. (2020). About conducting HSK and HSKK in late May notice of online Chinese test (Home Edition). http://www.chinesetest.cn/gonewcontent.do?id=44334661

10.

Jin

(2019). Research on the HSKK based on CAF analysis (Unpublished Master’s thesis). Shanghai Jiaotong University, Shanghai, China.

11.

Knoch

Chapelle

C. A.

(2018). Validation of rating processes within an argument-based framework. Language Testing, 35(4), 477–499. https://doi.org/10.1177/0265532217710049

12.

Luoma

(2004). Assessing speaking. Cambridge University Press. https://doi.org/10.1017/CBO9780511733017

13.

Ministry of Education (MoE). (2011). Spoken Chinese proficiency grading standards and testing guideline. Language and Culture Press.

14.

Ministry of Education (MoE). (2021). Chinese proficiency grading standards for international Chinese education. Shanghai Foreign Language Education Press.

15.

Peng

Yan

(2019). Walking the garden path towards academic Chinese language: Perspectives from international students in Chinese higher education. In Tao

Chen

(Eds.), Chinese for specific purposes (pp. 73–194). Springer. https://doi.org/10.1007/978-981-13-9505-5_4

16.

Peng

Yan

Cheng

(2020). Hanyu Shuiping Kaoshi (HSK): A multi-level, multi-purpose proficiency test. Language Testing, 38(2), 326–337. https://doi.org/10.1177/0265532220957298

17.

Qian

D. D.

(2009). Comparing direct and semi-direct modes for speaking assessment: Affective effects on test takers. Language Assessment Quarterly, 6(2), 113–125. https://doi.org/10.1080/15434300902800059

18.

Roever

Dai

D. W.

(2021). Reconceptualizing interactional competence for language testing. In Salaberry

Burch

(Eds.), Assessing speaking in context: Expanding the construct and its applications (pp. 23–49). Multilingual Matters. https://doi.org/10.21832/9781788923828-003

19.

Teng

(2017). Hanyu Shuiping Kaoshi (HSK): Past, present, and future. In Zhang

Lin

C.-H.

(Eds.), Chinese as a second language assessment (pp. 3–20). Springer. https://doi.org/10.1007/978-981-10-4089-4_1

20.

Wang

(2020). Design and improvement of the input of advanced-level HSKK: A comparative study between HSKK and TOEFL iBT. Proceedings of the 13^th postgraduate students’ forum on teaching Chinese as a foreign language (pp. 1–12), Beijing, China.

21.

Wang

(2014). Exploring the washback of a large-scale high-stakes Chinese test, the Hanyu Shuiping Kaoshi, on learner factors (Unpublished Master’s thesis). McGill University, Montreal, Québec, Canada.

22.

Wang

(2018). Investigating the consequential validity of the Hanyu Shuiping Kaoshi by using an argument-based framework (Unpublished PhD dissertation). McGill University, Montreal, Québec, Canada.

23.

Wang

Jiang

(2017). Comparing the test structure of the old and new HSKK. Journal of Liaoning Educational Administration Institute, 4, 77–81.

24.

Yan

Maeda

Ginther

(2016). Elicited imitation as a measure of second language proficiency: A narrative review and meta-analysis. Language Testing, 33(4), 497–528. https://doi.org/10.1177/0265532215594643

25.

Yang

(2017). Validity analysis of the advanced-level HSKK. Higher Education Forum, 1, 99–100.

26.

Yao

Wallace

M. P.

(2021). Language assessment for immigration: A review of validation research over the last two decades. Frontiers in Psychology, 12, Article 773132. https://doi.org/10.3389/fpsyg.2021.773132

27.

Yuan

(2017). Investigating the promotion of Chinese language tests (HSK/HSKK/YCT) in London Confucius Institute (Unpublished Master’s thesis). Beijing Foreign Studies University, Beijing, China.

Assessing the speaking proficiency of L2 Chinese learners: Review of the Hanyu Shuiping Kouyu Kaoshi

Abstract

Keywords

Introduction

Test purpose

Levels

Length and administration

Author and publisher

Price

Appraisal of the HSKK

Assessment tasks

Structures

Task authenticity

Assessment records

Scoring

Reliability

Interpretations

Construct validity

Levels and abilities

Decisions

Consequences

Conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References