Abstract
Keywords
Introduction
This study examines the relationship between children’s comprehension of time in narrative film and their interpretation of event relations and character development. The relationship between children’s comprehension and interpretation of narrative strategies in audiovisual storytelling is an under-researched area. This is partly because research into children’s reception of film comprises mostly studies of the impact of isolated formal features, such as shot duration and transition effects, intonation and colour saturation (Canelhas and Vicente, 2024; Fisch et al., 2001; Lillard et al., 2015; Smith et al., 1985). Although children’s comprehension of discourse structures such as simultaneous (Beentjes et al., 2001) or (a)chronological presentation of events in film (Lowe and Durkin, 1999) has received some attention, little is known about whether children’s ability to comprehend complex narrative strategies, such as flashbacks, helps them interpret the higher-level meanings, such as event causality and character development, that these strategies usually convey. This gap in existing knowledge reflects the challenges of studying the reception of complex film narratives in general (Tseng et al., 2021) and young children’s interpretation of abstract meanings in film in particular, such as the limited suitability of methods developed for adult audiences (e.g. comprehensive surveys or extended, semi-structured interviews) for research with young children, and the need to account for children’s developing cognitive skills and their experience with film viewing.
In this article, we seek to extend our understanding of children’s reception of narrative film by examining how children’s comprehension of flashbacks, as a complex storytelling strategy, supports their interpretation of higher-level, more abstract meanings such as event causality and character development in narrative film. To achieve this, we employ a social semiotic framework for analysing the co-patterning, or combination, of temporal semantic relations in audiovisual narratives for children (Tseng and Djonov, 2023) and propose a novel model of children’s reception of audiovisual narratives. Drawing on cognitive theories and empirical research of film interpretation, the model distinguishes children’s comprehension of concrete filmic cues, such as the representation of characters and their actions, from children’s interpretation of more abstract meanings, such as event causality, character development and broader narrative themes. We will empirically test this model through an exploratory, social-semiotic study of the relationship between 7 to 10-year-old children’s comprehension
The new knowledge generated in this study can inform the evaluation, selection and design of audiovisual narratives for use in education, where multimodal narratives are well recognized as a powerful tool for teaching children about cognitively challenging concepts such as time (Masterman and Rogers, 2002; Solé, 2019; Zhang and Hudson, 2018), engaging students in examining social themes and values (Rothery and Stenglin, 1997; Stephens and McCallum, 1998), and fostering critical multimodal literacy by drawing attention to the role of different semiotic resources and storytelling strategies in promoting some ideologies while suppressing others (Djonov and Tseng, 2021; Djonov et al., 2021).
This article opens with definitions of (i) narrative complexity in film and (ii) film reception as comprising two interdependent processes – comprehension and interpretation. We then review research of children’s reception of film and the key principles of social semiotic theory, which together inform the conceptual model of the centrality of time in children’s comprehension and interpretation of audiovisual narratives that we propose in this paper. After introducing this model, we present Tseng and Djonov’s (2023) framework for analysing temporal relations in narratives for children. This detailed presentation informs the film-reception study that we focus on in this paper, which revisits and further extends the analysis of data presented in Tseng and Djonov (2023) to empirically test the new model proposed here, of the relationship between children’s comprehension and their interpretation of audiovisual narratives. The article concludes with a discussion of implications of our study for future research, particularly regarding children’s reception of narrative complexity in film.
Narrative Complexity, Flashbacks and Film Comprehension And Interpretation
Complex narratives have received considerable research attention since the early 2000s, based on a shift toward more sophisticated storytelling in television and popular cinema (Buckland, 2009). Film scholars have described narrative complexity as a storytelling approach that blends episodic and serial forms, and emphasizes character development and voices, multi-layered plots, and often non-linear or fragmented timelines, and have argued that it has the power to stimulate deeper engagement with narrative themes (Ros and Kiss, 2018; Willemsen and Kiss, 2020).
Narrative complexity can also be found in films for children. For example,
Despite the prominence narrative complexity has gained in cognitive and multimodal film studies (Ros and Kiss, 2018; Tseng, 2017, 2018), the term has no unanimously accepted definition (Hven, 2017). This may be because what is considered complex differs across contexts (art and abstract vs popular film) and audiences. While adults might not find flashbacks challenging (Sevenants and d’Ydewalle, 2011), empirical studies such as those we review in the following section reveal that young children struggle to comprehend non-linear event sequencing in film narratives. Nevertheless, studies of narrative complexity in film have so far ignored productions for children.
Although definitions of narrative complexity vary, there is wide agreement that time, space and multimodality are crucial in the reception of complex film narratives. Time is a defining resource for the representation of events in storytelling, particularly in film. Time is also abstract, and its representation and perception therefore rely on the embodied actions and appearance of characters alongside other visual and aural cues, and their interaction (Coëgnarts and Kravanja, 2012; Gordejuela, 2021; Ros and Kiss, 2018; Willemsen and Kiss, 2020).
In this study, we examine narrative complexity that involves multimodal storytelling structure and techniques that go beyond canonical, linear event sequencing. Specifically, we focus on children’s comprehension and interpretation of flashback scenes. Turim (1989: 2) defines the flashback as ‘a representation of the past that intervenes within the present flow of film narrative’ and underscores its power to engage the audience cognitively and emotionally. Flashbacks achieve that by: allowing a narrative to start from a climactic event and represent the events that led to it afterwards; revealing a character’s memories and thereby their perspectives on past events or motivation for their past or future actions; and inviting reflection on the same event through the perspectives/memories of different characters (Bordwell et al., 2023; Gordejuela, 2021).
In examining children’s reception of flashbacks as a form of narrative complexity in film, we distinguish between two interrelated processes:
Our study makes a novel contribution to research on narrative complexity in film in two ways. First, it examines children’s comprehension as well as interpretation of flashbacks as a type of narrative complexity in popular children’s films. Second, it adopts a social semiotic approach to the analysis of narrative and a multimodal discourse–semantic framework for analysing temporal relations in audiovisual narratives for children and predicting the impact of their co-patterning (i.e. the ways these relations are combined) on children’s narrative comprehension and interpretation.
Studies of Children’s Film Reception
Children’s reception of audiovisual narratives has received considerable attention, especially from researchers in child development and education. Most have adopted experimental designs to examine the effect of (i) conventional film editing techniques, (ii) editing that disrupts temporal or spatial continuity, and (iii) non-continuous narrative sequences across different age groups.
Studies of the impact of conventional film editing techniques on children’s comprehension reveal that by 4–5 years of age, children can comprehend videos using first-order editing techniques such as pans (Pittorf et al., 2014), jump cuts (Munk et al., 2012) and continuity editing (Canelhas and Vicente, 2024). For example, in an experiment with 92 Portuguese children aged 3–5 years, Canelhas and Vicente (2024) used two videos in which a person introduced the vocabulary item ‘rain stick’ and explained how the instrument works – one unedited and the other edited so that its average shot length and its variation and frequency of close, medium and wide shots matched those found in educational videos on the children’s TV programs
The contribution of editing to the pacing of audiovisual narratives for children has also been studied. McCollum and Bryant (2003) developed and employed a pacing index (based on frequency of cuts, related and unrelated scene changes, auditory changes and proportion of active motion, active talking and active music) in their automated content analysis of 85 highly rated, popular US children’s television programmes. The analysis revealed that those on commercial networks had faster pace than curriculum-based, or educational, programmes. Moving beyond examining audiovisual texts alone, Lillard et al. involved 60 4-year-old children in a study that demonstrated that bottom-up stimuli such as changes in the audio and/or visual stream combined with top-down stimuli such as fantastical events that do not match children’s cognitive schemas simultaneously exert pressure on their executive function. Essex et al. (2022) analysed the videos previously identified by Lillard et al. (2015) as depleting executive function and found that these had higher flicker, edge density, situational change rate, but – contrary to expectation – longer shot duration than the non-depleting ones.
Research has also investigated the impact of editing strategies that disrupt the chronological presentation of events or the continuity of space. One example is Abelman’s (1990) seminal investigation of the comprehension of realistic, reversed and time-leap sequencing of events in an edited television programme by 215 children aged 4, 6 and 8 years. The children’s performance in a picture sequencing task showed that their cognitive development significantly supported comprehension of the realistic sequence, had a moderate impact in the reversed sequence condition but did not aid understanding of the time-leap sequence, in which real time was compressed through the removal of extraneous events. Abelman also found that children’s level of television consumption, as reported in parental diaries, contributed to but did not completely explain their performance.
Considering both spatial and temporal discontinuity, Beentjes et al. (2001) examined the ability of 45 children to comprehend three items created for their study by the producers of the Dutch version of
Rather than concentrate on distinct shot and scene transition effects alone, other researchers have examined the impact of achronological narration in general. Lowe and Durkin (1999) randomly assigned an equal number of students from three groups of 30 first-, third- and fifth-graders to watch an edited televised police drama presenting events in canonical, flashback or jumbled order, and then perform comprehension and a picture sequencing test. While children’s comprehension of the narrative increased across the grades for all three versions, and comprehension of central, peripheral and implied content did not vary by the order of events presentation, each age group’s sequencing task performance for the flashback version, contrary to expectations, was not better than for the jumbled one. Similarly, Munk et al.’s (2012) eye-tracking study involved 79 children (aged 4, 6 and 8 years) in viewing nine short films representing everyday experiences (two versions of each) and found that while there were no age effects for jump cuts, 6- and 8-year-olds experienced increased reaction times with reverse angle shots and all the children struggled with narrative discontinuity.
The studies reviewed in this section shed light on the challenges and gaps in research into children’s reception of audiovisual narratives. One key challenge is the need for measures and methods for testing children’s comprehension that are sensitive to and can capture developmental changes, for example, employing short videos and picture sequencing tasks or eye-tracking to suit younger children’s attention span, language and literacy skills. To establish the effects of isolated formal low-level audio-visual stimuli (e.g. jump cut vs dissolve; presence or absence of sound effects) or semantic features (e.g. chronological vs achronological sequence of events), previous studies have adopted experimental methodologies that are resource-intensive and typically start with creating videos that differ in their presence or absence of the studied features. While the findings of these studies have the power to inform the design of audiovisual materials for education, their value for evaluating and selecting from existing popular films and television or online video offerings for children is limited. Empirical studies aimed at evaluating existing audiovisual narratives need to analyse how these narratives combine different semantic features and how such co-patterning shapes children’s comprehension and interpretation of complex non-linear sequencing such as flashbacks, which may be found in different story phases and represented with different film techniques.
Social Semiotics as a Framework for the Multimodal Analysis of Narrative Film
The study presented here is grounded in social semiotic theory, which stems from Michael Halliday’s Systemic Functional Linguistics (SFL), where language is viewed as one among many semiotic resources in society and modelled as a system of choices for making meaning that develops in response to the functions it fulfils in society (Halliday, 1978; Halliday and Matthiessen, 2004). In SFL, a text is any act of communication with socially ascribed unity that simultaneously realizes three types of meaning, as it: represents aspects of experience (participants, processes and circumstances) and the logical relations between them (ideational meaning); reflects and helps negotiate social roles, relationships and attitudes (interpersonal meaning); and weaves these meanings into a cohesive and coherent unit of meaning that both reflects and contributes to shaping the situational and socio-cultural context of its production and reception (textual meaning).
The focus on meaning-making in social context has inspired two extensions of Halliday’s theory that are relevant for studies of multimodal narratives. The first is social semiotic genre theory, which focuses on how texts fulfill different social purposes through combinations of linguistic and non-linguistic semiotic choices organized into compulsory and optional stages or components (Martin and Rose, 2008; Van Leeuwen, 2005). From this perspective, story genres, which include narrative, recount and anecdote, function to entertain their audience, feature a sequence of events in time, and rely on time both for representing these sequences (ideational meaning) and for orchestrating different semiotic resources to achieve cohesion (textual meaning). All story genres open with an Orientation stage, where the characters and setting of the story are introduced, and may close with a Coda, presenting the key moral or lesson of the story. Among story genres, as Rothery and Stenglin (1997) argue, narrative is the most entertaining due to its three distinctive obligatory stages: the Complication, which presents a problem that disrupts the main characters’ lives and may lead to a crisis; the Evaluation, which reveals the characters’ inner thoughts and feelings, and the narrator’s attitudes towards the disruptive events, and thereby may contribute to character development or add suspense and make the story more engaging; and the Resolution, where the problem is resolved and a sense of equilibrium is established again.
Another valuable extension of Halliday’s social semiotic theory is the development of frameworks for modelling the meaning-making potential of non-linguistic semiotic resources such as layout, colour and sound, and analysing their use and interaction with language and other resources in multimodal texts (Van Leeuwen, 2005) such as picture books (Painter et al., 2013) and film (Tseng, 2013). Social semiotic frameworks for multimodal analysis, such as the one for analysing time in children’s narratives developed by Tseng and Djonov (2023), typically focus on semantic features, or the meanings that different modes can contribute to. This acknowledges that non-linguistic modes differ in their materiality and stratal organization from each other and from language (where phonology or graphology realize patterns of choices at the stratum of lexico-grammar, which in turn realize broader discourse-semantic choices).
Model of Narrative Comprehension and Interpretation: From Concrete Representations to Abstract Meanings
Underlying our study is a conceptual model of the comprehension and interpretation of film narrative. The model is informed by cognitive studies of the multimodal representation of time in film and shown in Figure 1. Here, time is viewed as playing a central role in the representation of events and the organization of narrative, and therefore plays a crucial role in narrative comprehension. Drawing on research showing that the understanding of time in language builds on embodied experiences and spatial cognition (Boroditsky, 2000; Thibodeau et al., 2017), our model incorporates the hypothesis that children’s comprehension of temporal relations in film is also likely to start with noticing and interpreting embodied elements such as characters’ actions and space (see Coëgnarts and Kravanja, 2012; Gordejuela, 2021; Ros and Kiss, 2018). In turn, understanding of temporal relationships is required to comprehend event causality, which is crucial for children’s ability to interpret characters’ motivation and development, and engage with the social themes and values that many narratives convey (see Bordwell, 1989; Bordwell et al., 2023; Eder, 2010; Smith, 2022[1995]). Based on this model, children’s ability to understand the co-patterning of time relations involved in a flashback depends on observing changes in embodied features represented on the screen. In this study, we will examine the relationship between children’s ability to comprehend a temporally complex narrative scene and to interpret its role in event causality and character development.

Comprehension and interpretation of film narrative: A conceptual model.
Analysing Time in Narratives for Children
The exploratory empirical study of children’s comprehension and interpretation of narrative film that we present in this article employed a framework developed to support the analysis of temporal relations in narratives for children and empirical studies of how their co-patterning, or combination, affects the comprehensibility of complex narrative scenes (Tseng and Djonov, 2023). The framework is informed by the SFL understanding that temporal relations are semantic in nature, operate at different levels in verbal discourse (realized by circumstances within clauses or conjunctions connecting clauses or larger units of text) and may be implicit, reliant on inferencing, rather than explicitly signalled through words such as ‘then’ or ‘simultaneously’ (Halliday and Matthiessen, 2004). Their semantic nature has also motivated multimodal studies of temporal relations in picture books (Painter et al., 2013), visuals in history textbooks (Derewianka and Coffin, 2008), film (Van Leeuwen, 1985, 1991) and, most recently, English language learning apps for preschool children (Tan et al., 2024). The framework also incorporates insights about different kinds of temporal relations from narratology (Genette, 1980) and empirical studies of children’s comprehension of multimodal narratives (Essex et al., 2022; Hoodless, 2002; Lillard et al., 2015; McCollum Jr and Bryant, 2003; Solé, 2019).
In this section, we introduce Tseng and Djonov’s (2023) framework with examples from the two scenes from popular Disney movies used in our exploratory study. For the purposes of this article, we analysed the two scenes – which each contain a flashback and no speech – from a social semiotic perspective, considering each scene’s function in the structure of the narrative. The first scene, from Disney’s

Anton Ego’s flashback in
The second flashback scene we used in our study is from

Rapunzel’s flashback and identity revelation in
To analyse the temporal relations in these two segments, we employed the system network in Figure 4. System networks are a convention for modelling paradigmatic choices for making meaning in a particular context (Halliday and Matthiessen, 2004). The context for this network is ‘time in children’s narrative’ and it comprises three systems of choices for constructing and describing temporal relations,

A semantic system network of temporal relations in children’s narratives.
Let us now briefly define each of the options in the system and illustrate them through examples from the analyses of Anton Ego’s and Rapunzel’s flashback scenes, presented respectively in Figures 5 and 6. These analyses show simultaneous systems on the left and temporal relations from these systems above the relevant screen capture/s. Relations that are more likely to challenge children’s comprehension (e.g. achronological sequence, inexact time point, multiple layers) are shown in bold in Figures 5 and 6.

Temporal relations in Anton Ego’s flashback in

Temporal relations in Rapunzel’s flashback and subsequent identity revelation in
The subsystem of
The system of
Another subsystem concerned with the sequencing of events is
As our description of the segments from
From Comprehension of Time to Interpretation of Event Relations: An Empirical Study
In this section, we present an exploratory empirical study of the relationship between children’s comprehension and their interpretation of flashback as a type of narrative complexity in film. The study extends the one designed to illustrate the power of Tseng and Djonov’s (2023) framework for analysing time in children’s narrative to predict and test the impact of different combinations of temporal relations on children’s comprehension of audiovisual narratives. Here, we employ the same data and extend the analysis in order to test the model of the relationship between children’s comprehension and interpretation of narrative complexity, proposed in this article.
Participants
The study involved 28 children aged 7 to 10 years, recruited using convenience sampling through our personal contacts in Australia, Germany and Taiwan. The project complied with all requirements of the ethical clearance obtained through the second author’s university’s human research ethics committee. All children had parental consent to participate in the study and their names were replaced with pseudonyms.
The decision to focus on 7 to 10-year-olds was informed by research showing that children have limited capacity to compare reality with, and therefore to learn from, moving images until around 5 years of age (Barr, 2010) and find events presented in achronological order challenging to comprehend until 6–7 years of age (Hoodless, 2002), but demonstrate adult-like familiarity and skills in interpreting film editing conventions from 11–12 years (Essex et al., 2022). Considered together, these studies suggest that children aged 7–10 years can comprehend complex narrative strategies in film by drawing on discourse–semantic relations, such as those mapped in Tseng and Djonov’s (2023) framework for analysing time in children’s narrative, rather than knowledge of formal editing techniques.
Method of data collection and analysis
With a focus on children’s ability to comprehend time in audiovisual narratives, Djonov and Tseng (2023) used the two Disney film segments discussed above and a free recall test. Immediately after watching each segment, the child was asked two questions: ‘
‘She looks at the flag and she looks around the castle, and she sees the same symbol’s all over it, and she
‘Rapunzel was having a dream about the glowing light thing. She found a crown and she put it on, and she woke up and it was just a dream.’ (Eddy, 9 years, familiar with
‘She saw the shape of the clothes and then saw lots of suns on the wall. She painted those suns I think. Then she
Krippendorff’s alpha (Krippendorff, 2011) was utilized to measure intercoder reliability, revealing sufficiently high agreement between two coders (0,761).
Djonov and Tseng’s (2023) results revealed that, of the 28 children, all except one comprehended the flashback in
For the present article, we revisited the data set presented in Tseng and Djonov (2023) and extended our coding and analysis to test the following hypothesis:
To test this hypothesis, we adopted only the responses about the clip from
Results
The results shown in Table 1 support our hypothesis that the comprehension of the flashback segment in
Number of children who comprehended the flashback and interpreted Rapunzel’s revelation in
Discussion
Overall, the study presented here builds on Tseng and Djonov (2023) by proposing a model of the relationship between children’s comprehension and their interpretation of narrative. Combining this model with Tseng and Djonov’s framework for the multimodal discourse–semantic analysis of time in narratives for children, we moved beyond identifying and comparing the co-patterning of semantic relations in audiovisual narratives for children, and beyond predicting and testing their impact on children’s comprehension of audiovisual narratives. Extending Tseng and Djonov’s analysis of the responses of 28 children (aged 7 to 10 years) to a segment of Disney’s
The limitations of our study also reveal avenues for future empirical investigations of children’s film reception. Our study’s findings need to be interpreted with caution due to the general nature of the free recall test. It is possible, for example, that a child inferred Rapunzel’s identity revelation but did not comment on it in their response. Additionally, this method cannot disambiguate responses such as that in Example 4, which does not clearly demonstrate whether the child comprehended the flashback.
‘There was first a sun. The girl was staring at it for a while and gradually it became clear to her about her past and at the same time it made her realize that she should wear a crown. Then she lost consciousness.’ (Izzy, 8 years, not familiar with
Future research could examine children’s perception of concrete cues in film through the use of eye-tracking or other biometric measures, and complement such measures with methods such as structured interviews, think-aloud protocols or cued recall to build a more fine-grained picture of the cognitive processes at play in children’s comprehension and interpretation of narrative film. To be most effective, such methods, especially cued recall, need to be combined with a more in-depth, systematic attention to the realization of the semantic relations involved in complex narrative scenes. Although we have described how particular temporal relations are instantiated through the visual and auditory cues in the film segments analysed for this study, our focus has been on discourse–semantic relations. Examining the impact of cues such as sound design features, transitions and colour grading on children’s comprehension of narrative film would require closer mapping of the ways discourse–semantic relations are realized through the use of lower-level semiotic resources and formal filmic techniques.
Another limitation of our study is its small number of participants recruited through convenience sampling. Including more participants would increase the generalizability and applicability of the study’s findings. A larger study using purposive sampling would also support comparisons of the reception of complex narrative film strategies by children from different age groups, different cultural and linguistic backgrounds, with various levels of familiarity with the film in which these strategies are used or experience in film and TV viewing. Importantly, cross-cultural and cross-linguistic studies of children’s film reception require the development of culturally-sensitive non-linguistic methods (see McCormack and Hoerl, 2017).
Larger-scale, longitudinal studies can also shed light on changes in children’s interpretation of complex narrative strategies over repeated viewings of one or more films. As children’s exposure to interactive storytelling across different media continues to increase, research could also examine whether and how such exposure impacts children’s comprehension and interpretation of non-linear and other complex narratives in film and other media formats.
Finally, this study focused on flashbacks as a kind of narrative complexity that involves semantic relations that challenge children’s temporal cognition. In future, we plan to examine whether and how the interaction of temporal and spatial relations supports children’s comprehension and interpretation of complex narrative strategies in film. (For example, Ego’s childhood memory in
Conclusion
Inspired by and building on studies of children’s reception of audiovisual narratives and research on narrative complexity in film, this study adopted a multimodal discourse–semantic approach to examine children’s ability not only to recognize flashbacks as a complex narrative strategy but also to interpret their meaning in filmic storytelling. We proposed a model of the relationship between children’s comprehension and interpretation of audiovisual narratives, and tested it by extending Tseng and Djonov’s (2023) exploratory empirical study of chidren’s comprehension of time in film. As in our earlier study, we focused on the
