Abstract
Keywords
Introduction
Students build knowledge based on their pre-existing ideas. To provide all students with adequate opportunities to learn science, teachers should continually assess 1 their students’ thinking and make instructional decisions that are responsive (Kavanagh et al., 2020; Ruiz-Primo, 2011; Ruiz-Primo & Furtak, 2007; Taber, 2017). Within these activities, teacher judgments of student thinking are essential because teachers can only act on what they perceive, and the quality of teacher judgments determines how teachers respond (Schoenfeld, 2011). Situations conducive to the assessment of student thinking, which often occur spontaneously when interacting with the student, have been described as “teachable moments” (Shavelson et al., 2008). To provide students with adequate learning opportunities, teachers need to make note of relevant student statements and hypothesize about the underlying conceptions “on-the-fly,” all in the course of a buzzing lesson (Ruiz-Primo, 2011).
Practice-based learning offers a promising approach to enable preservice teachers to acquire the skills they need to engage in such complex classroom practice (Grossman et al., 2009; Kloser, 2014; McDonald et al., 2013; Stroupe et al., 2020). Grossman (2009) suggested to identify essential components of core teaching practices. Assessing student thinking represents a component of the more complex practice of eliciting, assessing, and using student thinking for further instruction. Novice teachers should be allowed to enact such practices in complexity-reduced yet authentic learning environments. Participants then have the chance to identify and engage in elements of teaching without facing the pressure of a real classroom situation. To illustrate practice components, representations of practice have been used, for example, written and video cases (Farrell et al., 2022; Kramer et al., 2020). Although studies have shown that both media types are suitable for fostering teachers’ skills in noticing and interpreting relevant aspects of classroom situations, how participants benefit from the different qualities of each media type remains empirically an open question (Friesen & Kuntze, 2018; Hoppe, Renkl, Seidel, et al., 2020; Kramer et al., 2020). Furthermore, the characteristics of the content to be observed apparently influence the acquisition of teachers’ assessment skills (Hoppe, Renkl, & Rieß, 2020). To our knowledge, this assumption has not been systematically addressed with regard to the conceptual complexity of student thinking about science.
With the goal to develop teacher trainings that promote core teaching practices, McDonald et al. (2013) highlight the need for “a deep understanding of how people learn to enact ambitious professional practice” (p. 379) in practice-based learning environments. More research is needed on how specific characteristics of learning environments for teacher professional education have an impact on their learning practice (Kavanagh et al., 2020). In this study, we present a concept of how the assessment of student thinking about significant ecological concepts can be fostered in line with principles of practice-based learning (Grossman et al., 2009; McDonald et al., 2013). Within a teacher training session, we experimentally varied essential components of the learning environment to gain a deeper understanding of how instructional elements in practice-based learning and different characteristics of student thinking affect the acquisition of preservice science teachers’ assessment skills.
Teacher Assessment Skills: A Prerequisite for Responsive Teaching
Students build their knowledge on pre-existing ideas. To provide all students with appropriate learning opportunities, teachers need to address student thinking in the course of a lesson (Shavelson et al., 2008). This approach of responsive or adaptive teaching is supported by a large body of evidence (Decristan et al., 2015; Kavanagh et al., 2020; Ruiz-Primo & Furtak, 2007). Given that adaptive teaching is responsive to individual students’ learning requirements, it is seen as an essential means to achieve equity among learners (Windschitl & Barton, 2016). Following a Delphi expert panel approach, Kloser (2014) unsurprisingly identified the teaching practice of eliciting, assessing, and using student thinking as a core practice in teaching science. Kloser described this practice as teachers engaging in both formal and informal ways of probing student thinking and identifying students’ conceptions and mental models of the material world and scientific practices using a variety of assessment practices. Science teachers should then use the information on their students’ thinking to guide further instruction. These activities are considered to be common formative assessment practices during classroom interaction (Furtak, Kiemer, et al., 2016). Teacher judgments on student thinking form the basis of all further steps of formative assessment because teachers can only act on what they perceive, and the way teachers interpret student thinking determines how they may respond, give feedback to a student, or ask a question to further explore student thinking (Ruiz-Primo & Furtak, 2007; Schoenfeld, 2011; van Es & Sherin, 2021).
Situations in which teachers assess and use student thinking may be planned by the teacher, yet they often arise spontaneously and unplanned, for example, when a teacher overhears student statements during group work (Shavelson et al., 2008). In both cases, making judgments about student thinking requires teachers to engage immediately in assessment processes that involve the assessment skills of noticing and interpreting (Blömeke et al., 2015; Heitzmann et al., 2019). Noticing refers to teachers being able to draw their attention to relevant student statements among all other events happening during a school lesson. Interpreting refers to how teachers make sense of what they have noticed (Farrell et al., 2022). These cognitive skills can be considered integral to teachers’ assessment competence (Blömeke et al., 2015; Heitzmann et al., 2019; Hoppe, Renkl, & Rieß, 2020; Loibl et al., 2020) and have also been described in research on teachers’ professional vision (Blomberg et al., 2011) and teacher noticing (van Es & Sherin, 2002). The closely connected processes of noticing and interpreting lead to a more or less adequate judgment about student thinking (Loibl et al., 2020).
Essential indications for student thinking are provided by their verbal statements. To reach a hypothetical understanding of student thinking, the students’ statements must be cautiously interpreted (Duit et al., 2008). Adequate interpretations of student thinking impose a particular challenge on teachers (van Es, 2011). The pedagogical task for science teachers subsequent to a judgment entails linking student thinking based on their everyday experience to scientific knowledge (Gebhard et al., 2017). Teachers may build bridges from (or draw contrasts between) intuitive student thinking to scientific knowledge, which enhances students’ ability to reconstruct their conceptions toward scientifically valid perspectives (Duit et al., 2008). These strategies may be implemented, for example, by direct instruction or facilitating classroom discourse. Interpreting student thinking in a way that can inform such pedagogical decisions is difficult (Furtak, Kiemer, et al., 2016). Less experienced teachers tend to interpret student thinking according to the student’s general behavior or aspects of classroom management, for example, when teachers remark that a student hardly participates in a discussion (Schoenfeld, 2011; van Es, 2011). However, even when referring to subject-specific aspects in student statements, teachers often tend to evaluate student thinking as only right or wrong (see Chi et al., 2004; Furtak, Kiemer, et al., 2016; Ruiz-Primo & Furtak, 2007). To guide further instruction and provide students with appropriate learning opportunities, teachers must gain an understanding of the specific conceptions that underlie students’ statements and comprehend the way students think (Furtak, Kiemer, et al., 2016; Gropengießer & Marohn, 2018; van Es & Sherin, 2021).
Common Student Conceptions: Evidence-Based Representations of Student Thinking
Specific characteristics of student thinking as the object of assessment are likely to determine the quality of teachers’ judgments (Hoppe, Renkl, & Rieß, 2020; Schrader & Praetorius, 2018). Student thinking on scientific content varies greatly (Duit, 2009). To understand typical patterns of student thinking, researchers in science education have described student conceptions of scientific principles, phenomena, and concepts for decades. Student conceptions describe subjective mental constructions and have also been designated as, for example, student misconceptions, alternative conceptions, or naïve theories. When researchers aim for a description of student conceptions, they usually collect and evaluate qualitative data using, for example, group discussions. Although some student conceptions of varying scientific topics (e.g., anthropomorphisms) might occur, most student conceptions are domain-specific, that is, they differ on the scientific topic they refer to (e.g., blood circulation, or decomposition processes; Hammann & Asshoff, 2015).
Furthermore, student conceptions can differ in the complexity of their conceptual construction (Gropengießer, 2006). Gropengießer distinguishes, for example, between concepts, constructs, and principles. A
Understanding student conceptions requires complex reconstructive analysis processes because student statements must be interpreted by inference. This processing is difficult in the middle of a school lesson when teachers have no prior knowledge on potential student conceptions (Dannemann, 2020). Fortunately, many typical student conceptions have been described in empirical research for a variety of scientific topics. Descriptions of student conceptions usually contain an applicable concept name, a definition, and anchor examples (e.g., Hammann & Asshoff, 2015). These descriptions of typical student conceptions can serve as categories for teachers’ on-the-fly judgments of student thinking (Taber, 2017). However, an evidence-based category is, in most cases, an abbreviation of what students actually think, and does not cover all variants and deviations from this category (Dannemann, 2020). Therefore, a spontaneous judgment of student thinking based on such categories can only be considered a tentative but useful hypothesis, given the challenges of “on-the-fly” assessment of student conceptions (Heitzmann et al., 2019). Gaining a hypothetical understanding of how students think, teachers can then move on to adaptive instruction, or they can ask questions to validate their understanding of how they perceive a student’s thinking (van Es & Sherin, 2021).
Practice-Based Learning: An Approach to Foster the Application of Practice-Oriented Knowledge
To educate prospective teachers successfully in acquiring practice-oriented knowledge, the framework for teaching practice developed by Grossman et al. (2009) suggests identifying common core practices of the teaching profession and decomposing them into individual components that can then be rehearsed in complexity-reduced settings that display only certain aspects of classroom practice. Common core practices of the teaching profession represent “broadly applicable instructional strategies known to foster important kinds of student engagement and learning” (Windschitl et al., 2012). Such practices occur frequently in teaching, and they preserve the complexity of teaching in the sense that learning opportunities must be tailored to the specific needs of particular students (Grossman et al., 2009; Windschitl & Barton, 2016). For science teaching, a Delphi study highlighted eliciting, assessing, and using student thinking about science as a particularly significant practice (Kloser, 2014). Noticing and interpreting student conceptions which underlie student statements represent an essential component within this core teaching practice (Chi et al., 2004; Furtak, Thompson & van Es, 2016; Ruiz-Primo, 2011; Schoenfeld, 2011).
Teaching preservice teachers components of a more complex practice such as noticing and interpreting student thinking does not ensure they can adequately use these practices when they enact complete lessons (Kennedy, 2016). In practice-based teacher education, it is essential that teachers learn to connect those practices to specific purposes which relate to students, context, and content (Hauser & Schneider Kavanagh, 2019). McDonald et al. (2013) suggested organizing practice-based learning in a cycle in which preservice teachers can rehearse components of core practices engaging in activities with increasing complexity. The authors stress the need for the development and identification of pedagogies (e.g., video analysis), which teacher educators enact to support teachers in learning practice. To design learning environments based on principles of practice-based learning that foster specific skills, an important step is to assess how different components of a learning setting contribute to the acquisition of teacher skills. The question of which representations of practice can be used and implemented in teacher training is particularly prevalent within practice-based learning (Codreanu et al., 2020; Danielson et al., 2018; Grossman et al., 2009).
Written and Video Cases: Representations of Practice
Video and written cases are commonly used as representations of practices (Danielson et al., 2018). Both media types provide the opportunity for teacher students to engage in certain aspects of teaching practice without facing the pressure of a real classroom situation. Novice teachers are offered the chance to rehearse, revise, and retry certain aspects of practice (Grossman et al., 2009). Video cases are considered to be particularly authentic because they display many facets of practice. Yet, this complexity may lead to mental load on the viewers and be perceived as overtraining (Farrell et al., 2022). However, results in research on mental load caused by video viewing have not been consistent (Kramer et al., 2020). Written cases are considered to be less authentic because they display fewer facets of classroom practice (Danielson et al., 2018). The lower authenticity of written cases offers its own advantages (Grossman et al., 2009). For example, written cases allow participants to sequentially process the information presented (Gold et al., 2016).
Studies have shown that text and video cases have similar potential in promoting teacher skills in noticing and interpreting classroom situations (e.g., Hoppe, Renkl, & Rieß, 2020; Kramer et al., 2020; Schneider et al., 2016). Given that video and written cases offer different potentials and limitations, the question arises as to how both media types contribute to the acquisition of cognitive skills when used in combination. Learners might thus benefit from the complexity and richness of a classroom situation represented in videos as well as the opportunity for sequential processing of information that written cases offer. Research on the effects of media-combined learning environments for teachers has been very scarce. Kramer et al. (2020) found that compared to using transcripts only, a combination of videos and transcripts in fostering teachers’ situation-specific skills of classroom management was particularly effective. Participants in the media-combined condition worked with transcripts first and then moved on to working with videos. The transcript and video sequence was not varied.
When evaluating the benefit of written and video cases in teacher training, motivational constructs are often measured in addition to the outcomes of the intended cognitive teacher skills (Codreanu et al., 2020). Given that videos represent many facets of real classroom practice, and they offer viewers the opportunity to become immersed in the classroom situation (Farrell et al., 2022). This motivational construct has been framed as
Rehearsals of Components of Practice
Practice-based learning suggests rehearsing a certain component of teacher practice and then moving on to the next step characterized by more features of authentic practice (Grossman et al., 2009; McDonald et al., 2013). Such an increase in authenticity, which Grossman et al. (2009) called approximations of practice, might involve a more complete or integrated representation of practice or fuller participation by a novice (Grossman et al., 2009). For example, participants might examine video exemplars and in a subsequent step rehearse what they observed in a live role-play. This procedure in practice-based learning implies the question of how much rehearsal is needed until participants are ready to move to the next step. When practicing assessment of student thinking, the teachers’ learning progress likely depends on the scientific target content of student thinking as well as on the conceptual complexity of students’ statements that are being noticed. Focusing on student conceptions about evolution, Fischer et al. (2021) found that trainee teachers assessed anthropomorphic misconceptions significantly more often than teleological misconceptions. Effects of the conceptual complexity of student thinking on teachers’ assessment competences to the best of our knowledge have not been systematically tested and reported. Nonetheless, when developing pedagogies of how teachers can engage in teacher practices in successive steps, such as assessing student thinking, an evaluation of how much practice is needed with respect to varying content complexity is crucial. In a small sample study, Hoppe, Renkl, & Rieß (2020) found that learning to assess more complex student conceptions appears particularly challenging compared to assessing less complex student conceptions.
Aim of Study
The present study aimed to expand knowledge on the design of practice-based teacher trainings for fostering teachers’ assessment skills of student thinking. We assessed how the use of different media types as representations of practice, the number of rehearsals, and different characteristics of student thinking that account for its complexity contribute to preservice teachers’ acquisition of assessment skills. Furthermore, we examined how a combination of these aspects in practice-based teacher trainings can advance teachers’ on-the-fly judgments of student thinking. We addressed the following research questions:
Our first research question aimed to validate previous findings. Correspondingly, we assumed that video and written cases are both suitable to promote teachers’ acquisition of assessment skills. With respect to the complexity of student conceptions, we expected that the acquisition of assessment skills would occur differently because student conceptions of different conceptual complexity represent different requirements for teachers’ noticing and interpreting. Moreover, video cases are often assumed to bear particular potential because they provide an opportunity for participants to become immersed in the situation, but previous research shows no clear advantage of videos for participants’ perceived immersion in a case.
This research question was explorative given its novelty. With regard to the teaching principle of increasing complexity, preservice teachers might benefit from practicing the assessment of less complex content using less complex representations of practice (written cases) before moving on to more complex content using a more complex representation of practice (video cases). With regard to participants’ perceived immersion it is novel and of particular interest to examine how they perceive their immersion in a media-combined training.
Student conceptions vary not only on the scientific topic they refer to but also on the complexity of their conceptual construction (Chi et al., 1994; Gropengießer & Marohn, 2018). The question is whether the number of necessary rehearsals depends on different degrees of conceptual complexity in student conceptions. This research question also has not been addressed in the context of on-the-fly assessment. Expecting that assessing more complex conceptions would be more difficult, we assumed that participants benefit from more rehearsals.
Method
Participants and Design
Participants were recruited within a course in a Biology education master’s program for preservice teachers at two universities. The sample consisted of 104 participants (81 female, 23 male). The average age was 24.75 (
The intervention study followed an experimental pre-posttest design including a further interim measurement time. The individual preservice teachers were randomized to experimental conditions (see Table 1). All participants in experimental groups took part in a practice-based assessment training comprised of two rehearsal sessions. Within the experimental conditions, we varied media type (video or text) as well as content (student conceptions of different complexity). After an interim measurement time, media type and content were switched for each group in the second rehearsal (e.g., video case with less complex content during the first rehearsal, then written case with complex content during second rehearsal). The student conceptions represented in the classroom cases referred to basic ecological concepts from the overarching topical domain of material cycles in ecosystems. A control condition had access to the same information about specific ecological student conceptions as the experimental conditions, but participants did not rehearse assessing student conceptions. Instead, they took part in lesson planning activities addressing those specific student conceptions. More precisely, participants in the control condition were asked to think about learning activities that build on those specific student conceptions and allow students to move their thinking forward (e.g., generating cognitive conflict in the mind of a student).
Experimental Groups and Control Group (N = 104).
Development of Video and Written Cases to Promote Assessment Skills
To foster preservice teachers’ assessment skills, the research team developed classroom cases following manuals that depict the production of scripted yet authentic classroom vignettes (Piwowar et al., 2017; Seidel et al., 2022). The video cases were approximately 2 min long. For each topic domain (importance of plants in ecosystems and decomposition processes), three vignettes had been produced. In each vignette, a short group work episode (science classroom) was presented. Four students worked on their respective topic, that is, importance of plants in ecosystems or decomposition processes. Students’ verbal statements revealed underlying student conceptions in these domains. The student conceptions had been selected to represent common student conceptions, as identified in various empirical studies and among students at primary and lower secondary levels (Hammann & Asshoff, 2015). The selected student conceptions on the importance of plants in ecosystems can be classified as complex
To develop comparable text cases, the video cases were transcribed. Complementary non-verbal information (e.g., students’ gestures) had been added. Thus, we developed six cases in text format, each presenting about 2,000 text signs. Video and text cases contained additional information on school type, class level, and topic of the lesson.
Measures and Instruments
Assessment Skills
To assess the development of preservice teachers’ assessment skills, participants took part in a video-based test prior to, within, and after the intervention. The same instrument was used at each measurement point. The video-based test consisted of two scripted videos (each approx. 2 min) displaying a science classroom scene at a lower secondary level (see Figure 1). Similar to the cases used during the intervention, a group of four students worked on tasks related to the topic domain of material cycles in ecosystems. In a group discussion, they expressed their conceptions about the subject matter. The test consisted of two parts. Each of the two videos was linked to one of the student conception complexity levels and to a specific topic domain. The first part of the video test (Part A) represented complex student conceptions (

Transcribed Episode From a Video Vignette.
In addition to the video, participants received information about the classroom sequence (e.g., student names, class level, and learning materials) to provide a more comprehensive picture of the assessment situation and to allocate observations to certain students. The classroom cases used for the test differed from the cases used in the intervention in the group of students and the tasks on which the students worked. Participants were asked to analyze the videos with a focus on what seemed relevant for further teaching and learning. We used an open-question format which had already been used for determining various competence levels in previous studies (e.g., Kersting, 2008). To represent the fleetingness of the actual classroom situation, the videos could only be paused to note down observations. Single episodes or complete videos could not be watched again.
The quality of teachers’ assessment skills was operationalized as the level of noticing and interpreting student statements about underlying conceptions. We coded participants’ responses following the qualitative content analysis (Mayring, 2015). Categories had been formed deductively and inductively. First, we coded preservice teachers’ noticing in the sense that participants selected relevant student expressions that hint at the underlying conceptions. Selecting means that participants had notes about observations with reference to a specific anchor in the video or written case. We further determined the quality of participants’ observations in terms of their interpretations of presumable underlying student conceptions. For example, we distinguished whether student expressions were evaluated by their scientific correctness or whether an attempt had been made to understand the students’ way of thinking (Furtak, Kiemer, et al., 2016; van Es, 2011). Thus, a total of 18 categories emerged to describe the participants’ responses. These 18 categories were transferred into a six-level quasi-interval scale which was validated by three experts who assigned the categories to the values 1 to 6 (1 = very low level of assessment skills, 6 = very high level of assessment skills; intraclass correlation coefficient [ICC] = .938). Lower levels describe, for example, merely descriptive comments or comments that interpret student expressions about general student behavior. Lower levels of assessment skills further describe comments that evaluate student expressions as right or wrong but lack attempts to understand the students’ ways of thinking. Higher levels of assessment skills describe attempts to gain an understanding of the specific conceptions that underlie students’ statements and to comprehend the way students think. Each participant’s response that referred to a relevant passage in the video was assigned a value from 1 to 6. The scores were summed for each part of the video test. This procedure resulted in a sum score representing a comprehensive measure of assessment skills based on the preservice teachers’ proficiency. This measure includes both cognitive processes of noticing and interpreting at different degrees of adequacy. To determine the reliability of measuring assessment skills using the vignette test, Cronbach’s alpha was calculated. The reliability was acceptable for Part A of the video test (Cronbach’s alpha = .72) and excellent for Part B (Cronbach’s alpha = .91).
Immersion
To assess the preservice teachers’ perceived immersion while rehearsing assessment of student conceptions using video and written cases, participants rated their perception on a scale for immersion immediately after each rehearsal session (see Table 2). The scale consisted of seven items and was adapted from Syring et al. (2015). The internal consistency of the questionnaire was satisfying, with Cronbach’s alpha for perceived immersion = .75.
Description of the Scale for Immersion: Number of Items (N) and Sample Items.
Procedure
All participants attended two lecture-style sessions each lasting for 90 min (see Table 3). In the first session, ecological content knowledge relevant to assessment (e.g., decomposition processes, plant ecology) was addressed. The second session addressed general PCK on student conceptions (e.g., the role of student conceptions for student learning). The participants were further introduced to specific PCK on typical student conceptions in the content domains of ecology. This introduction to typical ecological student conceptions took place within two consecutive 90-min rehearsal sessions.
Procedure of Tests, Lectures, and Practice-Based Trainings.
In an example-based learning environment, participants were able to analyze classroom cases (video or written format) on the student conceptions that were expressed. Initially, participants were introduced to solution examples as to how student expressions can be successfully assessed according to the underlying student conceptions. At the same time, participants received specific PCK about common student conceptions of either complex conceptions about the importance of plants in ecosystems or less complex conceptions about decomposition processes (see Table 4). For the first video/written case, participants were given examples of what ecological student conceptions can be observed in the classroom sequence. They were then asked to watch the video again and comprehend the solution examples. Participants in the text group were asked to read the text again. For the next two videos/written cases, participants were asked to assess student conceptions themselves. For each video/written case, participants were able to compare their judgments to an expert solution. Thus, participants were asked to practice applying their knowledge about specific student conceptions using a total of three classroom cases during the first rehearsal. After the interim tests, three further classroom cases followed during the second rehearsal. These practice-based teacher trainings were designed as prestructured group work and could be completed independently by the participants. The purpose of this design was to minimize the influence of the lecturers on the results of the different training conditions. Other conditions (e.g., learning time and formulation of the tasks) were kept constant across the groups.
Student Conceptions.
In these rehearsals using representations of practice, we varied (a) the conceptual complexity, (b) the media type (video or written case), and (c) the sequence in which case type and content were presented within the two rehearsal sessions. Thus, for the second rehearsal phase, we exchanged case types and student conceptions in their complexity. For example, if participants had used video cases in the first phase, they received written cases in the second phase. If participants had rehearsed assessing complex student conceptions in the first phase, they rehearsed assessing less complex student conceptions in the second phase. The formal structure of the two rehearsal phases did not differ. After each rehearsal cycle, all participants in the experimental conditions were asked to report perceived immersion during enactment.
The sessions of a control condition were held chronologically parallel to the experimental conditions. Participants in the control condition received the same information about typical ecological student conceptions, including anchor examples as the experimental conditions (see Figure 2). Participants in the control condition included no rehearsal applying this knowledge using written and video cases, but they were asked to plan lessons that address ecological student conceptions. Thus, participants had an opportunity to process knowledge about typical student conceptions in-depth, but they were not able to enact such knowledge in an on-the-fly assessment activity using representations of practice.

Results for Assessing Complex Student Conceptions.
Results
Table 5 displays the means and standard deviations of key variables in all experimental conditions. An alpha level of .05 was used for all statistical tests. We used partial η2 as an effect size measure—qualifying values <0.06 as small effects, values in the range between 0.06 and 0.13 as medium effects, and values >0.13 as large effects.
Means and SD of Assessment Skills for Complex Conceptions (C) and Less Complex Conceptions (-C) at Three Measurement Times.
Pre-analysis
We calculated analysis of variances (ANOVAs) to determine whether there were any statistically significant differences between groups. We found no difference between groups in assessment skills or demographic variables such as age, enrollment in study programs, grades in biology, number of semesters, and teaching experiences (all
Research Question 1: To What Extent Do Different Case Types as Representations of Practice (Video or Written Cases) and Student Conceptions of Different Complexity Impact (a) the Acquisition of Preservice Teachers’ Assessment Skills and (b) Preservice Teachers’ Perceived Immersion in a First Rehearsal?
To examine the effectiveness of case-based training using different case formats and content of different complexity, we compared preservice teachers’ assessment skills between each experimental group and between experimental groups and a control condition. Results were generated separately for the assessment of complex content (A) and the assessment of less complex content (B) (see Table 5). After the first rehearsal, a rmANOVA showed significant differences in preservice teachers’ assessment skills for complex student conceptions (Part A) in comparison to pretest results,
We further compared preservice teachers’ achievement in assessing less complex (Part B) after the first rehearsal. Taking into account the results for pretest and interim test, a rmANOVA determined statistically significant differences between groups for the mean quality of assessment skills between groups,
On average, participants in all experimental groups evaluated their immersion in the upper half of the Likert-type response scale during the first rehearsal (see Table 6). An ANOVA revealed no significant differences between groups,
Means (SD) of Immersion After Both Rehearsal Sessions.
Research Question 2: To What Extent Does a Media-Combined Sequential Variation of Video and Written Cases and Student Conceptions of Different Complexity Impact (a) the Acquisition of Preservice Teachers’ Assessment Skills and (b) Preservice Teachers’ Perceived Immersion in a Second Rehearsal?
Case format and content were changed in the second rehearsal. For assessing complex student conceptions (Part A), an rANOVA showed significant differences in preservice teachers’ assessment skills between the experimental conditions and the control condition,

Results for Assessing Less Complex Student Conceptions
After the second rehearsal phase, participants evaluated their immersion again in the upper half of the Likert-type response scale (see Table 6). Yet, an ANOVA revealed significant differences between participants,
Research Question 3: Does Learning to Assess Complex Student Conceptions Require More Rehearsals Than Learning to Assess Less Complex Student Conceptions?
To address this research question, we refer to the analyses presented in the previous sections Research Questions 1 and 2. We compared the experimental groups’ performances to the performance of a control condition (see Table 5). Participants in all experimental groups outperformed participants in the control condition in assessing complex student conceptions only after a second rehearsal. Participants in the experimental conditions outperformed participants in the control group in assessing less complex student conceptions after the first rehearsal. However, participants in the experimental conditions only showed better assessment skills than the control condition, if they had rehearsed assessing less complex student conceptions (which had a different topical focus than complex student conceptions).
Discussion
This article focuses on how preservice science teachers can be supported in developing cognitive skills for assessing student thinking when interacting with students. Making such on-the-fly judgments of student conceptions informs teachers’ pedagogical decisions on further instruction in response to their students’ thinking. We focused on teachers’ on-the-fly assessment processes as a key decomposition within formative assessment activities. Following the teaching practice approach, we investigated the effects of media type, content complexity, and a sequential variation of those components on the development of preservice teachers’ assessment skills.
In Research Question 1, preservice teachers’ acquisition of assessment skills and perceived immersion within a video or text-based training was addressed. We found no evidence that either video or written cases are more beneficial in a single rehearsal. This finding is in line with previous research (Friesen & Kuntze, 2018). When choosing and developing suitable representations of practice, educators need to consider which facets of practice are made visible in the representation (Danielson et al., 2018). When assessing student thinking, noticing and interpreting what students say is important. Unlike non-verbal features such as mimics and gestures, students’ verbal statements might be well represented in both video and written cases. Although after a single rehearsal, both case types appear to be suitable for increasing participants’ on-the-fly assessment skills, the conceptual complexity of the content has a notable impact on the acquisition of these skills. Only experimental groups that had practiced assessing less complex student conceptions differed significantly from the control group’s performance. Conclusively, differences in the complexity of student thinking require an adaption of the learning environment.
Apart from the evaluation of preservice teachers’ assessment skills, we evaluated participants’ perceived immersion during rehearsals as a motivational construct. Participants in all conditions reported high values for perceived immersion while using written or video cases to practice assessing student thinking. Thus, we found no advantage for participants’ immersion with any particular media type and content after one rehearsal. Participants were able to instead achieve a high level of cognitive involvement in video and written cases.
Research Question 2 addressed the impact of a sequential variation of video and written cases and content of different conceptual complexity on the development of assessment skills and participants’ immersion during rehearsals. Analyses showed that all experimental groups substantially benefited in the development of assessment skills from consecutive rehearsals in all variations. Unlike after the first rehearsal, all experimental groups after the second rehearsal differed from the control condition, which had received the same PCK about common student conceptions about ecology but had no opportunity to apply this knowledge in text-based and video-based teacher training. This finding underlines the benefit of practice-based learning for teachers’ professional development in the case of on-the-fly judgments of student thinking.
Our findings indicate that within the approach of teaching practice using text and video cases to develop teachers’ assessment skills, different paths may prove to be effective. To confirm this assumption, further research is needed on the variation of media types and different content qualities in learning sequences for practice-based teacher education. These findings for the practicalities of science teacher education are not trivial. Results may encourage teacher educators to implement an element of choice when preservice teachers learn to assess student thinking in practice-based learning environments. Participants may choose a case format and content they prefer for rehearsals. By allowing prospective teachers a high degree of control over the learning process, their motivation to acquire assessment skills will likely be sustained, which might lead to enhanced learning outcomes (Scheiter, 2014).
Apart from analyzing the development of assessment skills, an evaluation of how participants reported their immersion during rehearsals contributes to a more nuanced understanding of the impact of using different media types that represent content of different complexity in learning sequences during teacher trainings. For their perceived immersion, participants in all experimental conditions reported high values for both rehearsals. However, results for the immersion of the experimental group that assessed complex student conceptions using videos as the more complex case format in the first rehearsal and then moved to assess less complex student conceptions using written cases as the less complex case format in the second rehearsal, offer a more nuanced evaluation of the intervention. This result partially differs from the findings of Syring et al. (2015). They compared participants’ immersion in video and written cases at three measurement times and found a significant difference in participants’ immersion in favor of the video condition at the first measurement. In the other two measurements, the differences were not significant. We suggest that not only the media type might influence immersion but characteristics of the content in interaction with the media type could also play a role. According to our theoretical considerations, the experimental condition that reported significantly lower values for perceived immersion was the only group that experienced no progression in complexity of either content or case type. This result suggests that for participants’ motivation, an element of progression in the complexity of the media type or content complexity when creating a series of rehearsals could be an important factor to consider. To test this hypothesis and to further explore how media type and choice of content affect participants’ motivation further research is needed.
Generally, teacher trainings designed for practicing evidence-based assessment of student thinking based on categories derived from common student conceptions helps teachers develop first reasonable hypotheses about their students’ thinking as a preliminary condition for further instruction. Apart from serving as an anchor for adaptive teaching or as further questions to explore student thinking, category-based interpretations of student conceptions contribute to the additional goal of developing a common language in the professionalization of teaching (Grossman et al., 2009; Heitzmann et al., 2019; Windschitl & Barton, 2016). In the context of formative assessment, being able to name specific
Although different paths appear to provide an opportunity for successful learning in multiple rehearsals, the question remains as to how much practice is needed to learn to assess a specific target content. In Research Question 3, the number of rehearsals needed for practicing the assessment of student conceptions that displayed different degrees of conceptual complexity was addressed. Comparisons with the control condition showed that for assessing less complex student conceptions, participants highly benefited from a single rehearsal session. For assessing complex student conceptions, the experimental groups only differed from the control group’s performance after the second rehearsal. This finding suggests that assessing student conceptions with more conceptual complexity requires more rehearsals until a learning cycle is considered complete and before advancing to the next step. When learning to assess less complex student conceptions, one rehearsal appears to be sufficient. A noteworthy finding was that even after a second rehearsal phase, which in our study design consisted of student conceptions of a different content and different complexity, performance in assessing complex student conceptions was lower than the performance of less complex student conceptions. Further research is needed to explore whether a second rehearsal assessing student conceptions with the same degree of complexity and the same content would further refine participants’ assessment skills.
We point out that for the vignette test we used, no further test-theoretical quality criteria were available besides the reported reliability coefficients. Due to our specific subject-matter education context in which assessment skills were measured, we could not resort to fitting instruments with proven reliability and validity. In principle, however, the assessment of teachers’ professional skills using video vignettes has proven successful many times (Hoppe, Renkl, Seidel, et al., 2020; Kersting, 2008; Schneider et al., 2016). Nevertheless, further studies should further investigate the validity of vignette-based assessment of preservice teachers’ assessment skills in specific subject areas. Also, we recognize the significance of teachers’ instructional decision-making subsequent to the assessment of student thinking. Blömeke et al. (2015) integrated teachers’ decision-making into their competence model. We suggest that this model can guide further studies. Future research may investigate, for example, how preservice teachers can be supported in recomposing assessment of student thinking and instructional decision-making when enacting more complete practices of formative assessment (Grossman et al., 2009).
Conclusion
First, the study provides conceptual value because it outlines how on-the-fly assessment of student thinking about science can be promoted in preservice teachers through rehearsing category-based assessment of student thinking using video and written cases as representations of practice. Second, the study provides insight into how specifics of practice-based learning environments may be designed to effectively foster preservice teachers’ on-the-fly assessment skills. Generally, teacher trainings using written and video cases as representations of practice, which provide participants with the opportunity to rehearse the assessment of student thinking, is effective. Written and video cases both serve as suitable representations of practice.
The conceptual complexity of the target content to be assessed, however, needs to be considered. Although a single rehearsal is effective for assessing less complex student conceptions, more complex content requires more practice. Given that the acquisition of assessment skills in student conceptions appears to be largely content-specific, participants always benefit from further rehearsals when a different content is presented. When sequences of rehearsals are being planned, different variations of media type (video or written cases) and content appear to have unique benefits in enhancing participants’ assessment skills. Analyses of participants’ immersion reveal a possible interaction between content complexity and media type on participants’ engagement. We suggest considering a progression in the complexity of media type or content complexity when planning sequences of rehearsals.
