Sage Journals: Discover world-class research

Abstract

This article explores the use of screenshots as a form of visual evidence on social media platforms. It considers their role in YouTube videos that spread misinformation and disinformation about the Notre Dame Cathedral Fire and an internet hoax, the Momo Challenge. The article draws on two social semiotic frameworks, legitimation (Van Leeuwen in ‘Legitimation in discourse and communication, 2007) and affiliation (Knight in ‘Evaluating experience in funny ways’, 2013, and Zappavigna in ‘Searchable Talk and Social Media Metadiscourse’, 2018), to analyse how screenshots and accompanying voiceovers construe technological authority and propagate social values. Seven key forms of screenshots are identified in the dataset, alongside the key social bonds that are made visually salient in the screenshots. Overall, this research contributes to how we understand the role of screenshots in instances of misinformation and disinformation, highlighting the importance of identifying the affiliation potential of the screenshot in order to determine its veracity.

Keywords

disinformation legitimation misinformation multimodal discourse analysis multimodality screen captures screenshots YouTube

1. Introduction: Screenshots And Social Semiotics

Screenshots, also known as ‘screen captures’ or ‘screen grabs’, are still images that replicate the contents of a computer or mobile device screen. The concept of a screenshot arose in the 1960s as a visual convention which ‘allowed the experience of using an interactive computer to be described and distributed’ before the concept of digital interactivity had been naturalized and people had learnt how to read these kinds of images as meaningful (Allen, 2016: 664). This article considers a more recent development in the meaning potential of screenshots as visual images, exploring their role in social media discourse. It focuses on how screenshots are legitimated as visual evidence and used to propagate certain kinds of social values. By embedding screenshots in social media posts, these images can be recontextualized beyond their original setting for many purposes, such as substantiating online activity, making personal encounters public, or archiving digital events. Even though screenshots can be easily manipulated via image editing software, they are often interpreted as true copies of content (Jaynes, 2020). This has the potential to fuel misinformation and disinformation practices. Misinformation is false information which is unintentionally shared, with no intention to cause harm, while disinformation is false information which is intentionally shared in order to cause harm (Wardle and Derakhshan, 2017). This study considers the role of screenshots in YouTube videos that spread instances of misinformation and disinformation by exploring how they function to legitimate certain knowledge, sources and values. Both terms misinformation and disinformation are used in this research to acknowledge that false information is multi-faceted and, due to the discourse-analytic approach this study takes, we cannot know for certain the intentions of the YouTuber.

Research into screenshots on social media is a relatively new area of study. Screenshots remain ‘neglected in public debate’ because we tend to ‘look through them’, rather than perceiving screenshots themselves as media objects (Frosh, 2018: 62). However, critical analysis of the meaning-making practices involved in producing and sharing screenshots is important for ‘examining the assumptions embedded in their form and function’ (Moore, 2014: 141). There has been some ethnographic work undertaken on everyday use of screenshots by teenagers which has noted their evidentiary role in ‘reaffirming friendships and resolving or igniting conflict’, subject to ethical codes for sharing, particularly when used in group chats (Jaynes, 2020: 1381).

In terms of research into the manipulation of screenshots, ‘evidence collages’ in media manipulation campaigns have been examined from an ethnographic perspective (Krafft and Donovan, 2020). These collages incorporate screenshots as a form of visual evidence that function as ‘a key strategic element in the formation and spread of disinformation’ (Krafft and Donovan, 2020: 205). Screenshots have also been studied as technologies for public shaming. For example, a qualitative thematic analysis of news media articles about the cases of Amanda Todd, a Canadian teenager who was cyberbullied and blackmailed with screenshots taken of her without her consent leading to her taking her own life, and Anthony Weiner, a US Congressman whose political career ended when screenshots were revealed of his extramarital flirtations, highlighted how screenshots are entangled between the notions of permanence and ephemerality, and the boundary violations that occur when the private is made public (Corry, 2021). Some studies have noted the role of screenshots in the propagation and visibility of racist discourses where ‘platform collapse’ and ‘mediated spillover’ occurs, as screenshots are shared across various social media platforms without users understanding their proper context (Bigman et al., 2022: 4). Whilst there is some positive potential for screenshots to call out social media users who engage in a ‘tweet and delete’ culture of harassment, this needs to be weighed against their potential negative impact in proliferating hateful content, for instance by amplifying racist content (Bigman et al., 2022: 11). On the other hand, screenshots may act as a tool for sousveillance, an effective form of visual persuasion, whereby organizations such as Racism Watchdog draw attention to online injustices (Jenkins and Cramer, 2022).

This issue of how to theorize the social functions of screenshots is also critical to understanding their visual meaning-making potential. Cramer et al. (2022: 6) have developed a four-quadrant conceptual model to understand the function of screenshots as either bookmarking or containing information for an individual’s own use in both the online or offline worlds, or disclosing and reframing information for alternative audiences. The study suggested that screenshots fulfil interpersonal needs and are examples of ‘networked sociality’ (Papacharissi, 2010: 316), in the sense that screenshots easily move across different social media platforms and architectures (Cramer et al., 2022). Švelch (2021) has also drawn attention to the need to develop a critical literacy of screen capture practices. In particular, they distinguish between ‘screen captures’, as any visual record of a screen, ‘photographic screen captures’, as a photographic record of the screen, and ‘screenshots’, as a screen capture that is a digital file created within the same device as the one that displayed the original screen capture (p. 559).

The multimodal approach adopted in the present study is complementary to this existing research into screenshots as it offers a way of understanding the social significance of screenshots by exploring how they make meaning and legitimate values. While studies have identified the motives of users and viewers in sharing screenshots (Jaynes, 2020) and how screenshots have been discussed in news articles (Corry, 2021), there is a lack of multimodal research that considers both their linguistic and visual use in instances of misinformation and disinformation. There is also a lack of work on the affiliative potential of the screenshot in terms of the role it can play in aligning people into shared communities who consider certain kinds of evidence to be legitimate. Instead, social media research has tended to focus on issues of polarization and misinformation (Dunaway, 2021) from media ecology and cognitive bias perspectives. There has been some previous work using legitimation to investigate political fake news (Igwebuike and Chimuanya, 2020) and delegitimation via internet memes (Ross and Rivers, 2017); however, none of this work has focused on screenshots. In order to add to this body of literature on screenshots, the key research questions guiding this study are:

R1: How are screenshots manipulated as legitimation and affiliation strategies in social media discourse?

R2: How can a multimodal semiotic framework be re-orientated to foreground these strategies discovered in social media discourse?

This article begins by detailing the dataset used in the case studies and the sampling strategy employed. The method section then introduces the social semiotic theoretical framework and explains the multimodal discourse approach to analysing screenshots using two frameworks: affiliation (Martin and White, 2005; Zappavigna, 2018), for understanding the social bonds and values at stake in the screenshots and legitimation (Van Leeuwen, 2007), for understanding how these values are positioned as valid, to be valued, or as believable. The results section discusses the importance of technological authority in the dataset, the different types of screenshots used and the social bonds implicated in the visual content of the screenshots. Lastly, future directions and practical applications of this research in terms of the study of misinformation and disinformation are reflected upon.

2. Dataset And Sampling Strategy

The present study forms part of a larger project studying misinformation and disinformation in a dataset of 30 videos in terms of both the visual and verbal discourse manifest in the videos and in their respective comment feeds (see Inwood, 2021; Inwood and Zappavigna, 2021; Inwood and Zappavigna, 2022a; Inwood and Zappavigna, 2022b). It was motivated by the high frequency of screenshots observed in the dataset, suggesting that their use was a visual pattern worthy of attention. For a manual analysis, this is considered a large dataset as 1,674 frames were manually analysed according to visual and verbal discourse.

Two case studies of screenshot use in YouTube videos about the 2019 Notre Dame Fire Cathedral and the Momo Challenge Hoax were selected for this study as they represent both political and non-political discourse and both received significant engagement on YouTube. As such, they offered the potential to examine different genres of ‘information disorder’ (Wardle and Derakhshan, 2017): the ‘Notre Dame Fire’ conspiracy is an instance of politically motivated hate speech, whilst the ‘Momo Challenge’ is an example of a mostly apolitical internet hoax related to moral panic about children and technology. The Notre Dame Fire videos in the dataset represent a range of conspiracies about the fire, including white supremacist discourse falsely implicating Muslims and immigrants. The Momo Challenge videos are about an internet hoax regarding a threatening figure contacting children on WhatsApp or being spliced into YouTube Kids videos.

The data collection and sampling procedure for this study involved several steps. A theoretical sampling approach was used to collect the YouTube videos, using YouTube Data Tools (Rieder, 2015). These videos were selected on the basis that they were in English, had more than 10,000 views, had comments enabled and, for the Notre Dame Fire videos, were created in the 24-hour period after the Notre Dame Fire. This resulted in a dataset of 272 Notre Dame Fire and 195 Momo Challenge videos. From this set, 15 videos were then selected for each case study. For the Notre Dame Fire, this was on the basis that they were under 15 minutes (as some videos were livestreams that lasted several hours), featured conspiratorial or false content about the fire and had the highest number of views. Conspiratorial or false content was defined as content that stated the fire was deliberately created, in contrast to the official statement by French prosecutors that determined the fire was an electrical accident. For the Momo Challenge, the videos were selected to represent the various macro-genres observed across the dataset: news reporting, entertainment, commentary, education and clickbait. The transcripts and visual content (a frame representing each second of video) were collected for each video.

3. Method

This study adopts a multimodal social semiotic approach interested in how language and other modes of communication make meaning when used in social contexts (Halliday, 1978). The analytical method integrates two social semiotic frameworks: affiliation (developed within Systemic Functional Linguistics) and legitimation. The affiliation framework is used to understand the key social bonds enacted through the use of screenshots together with the spoken verbal text of the YouTubers. It considers the values that the YouTuber projects onto the screenshots in order to forge alignment with the ambient audience of the video. The legitimation framework is used to understand how these bonds are legitimated. This involves exploring how credibility is verbally and visually constructed in the videos. The following subsections will explain the affiliation and legitimation analysis undertaken in the study.

3.1 Affiliation and social bonding analysis

According to the SFL affiliation framework, social bonds are realized in discourse as values, instantiated as evaluations targeted at entities, phenomena and activities (Martin and White, 2005; Zappavigna, 2018.). The present study draws on the Appraisal framework (Martin and White, 2005), a discourse semantic framework for analysing evaluative language, to systematically analyse the evaluative language used in the video transcripts. The focus was on the attitude system which describes three main regions of meaning: affect (feelings and reactions to behaviour), judgement (ethically assessing a person or behaviour) and appreciation (valuing an object or phenomenon). The affiliation analysis involved exploring the construction of ideation–attitude ‘couplings’ in the dataset. These are theorized in the framework as the discursive realization of the social ‘bonds’ (Knight, 2013) that align personae into communities of values (Zappavigna, 2018). Ideation-attitude couplings are represented throughout this article in square brackets, using the following annotation strategy with the ideation underlined (what is being evaluated) and the attitude (how it is being evaluated) presented in bold font:

This video was meaningless

[ideation: video/ attitude: negative valuation]

Technical appraisal terms are displayed in small caps to distinguish them from their common-sense meanings. The presentation of ideation–attitude couplings is important for this study because it explores at a technical level how the evaluative language used relates to broader social meanings (i.e. the social bonds), rather than purely ideological strategies. This analysis allows us to explore the semiotic practices of users, of which the texts in this dataset provide evidence, as opposed to analysing the intentions of users that would require different methodologies to gather data.

The resource of visual salience was important in order to understand the relation of the visual elements in the screenshots with the social bonds realized by the values identified in the transcripts. From a multimodal perspective, visual salience refers to the elements in an image/video that are depicted as the worthiest of attention, formed, according to Kress and Van Leeuwen (2006: 202), through one or more of the following dimensions:

– Size: larger elements rather than smaller elements

– Sharpness of focus: sharper rather than blurrier elements

– Tonal contrast: elements with high tonal contrast

– Colour contrast: strongly saturated elements

– Placement in the visual field: elements in the centre or top of the image

– Perspective: foregrounded elements

– Overlapping elements: the element that is overlapping other elements

– Cultural significance: the famous person in the image

– Personal significance: an element that is more significant due to the viewer’s personal experience

In this study, visual salience is adapted to the affiliation framework in order to understand how YouTubers situate themselves in relation to screenshots via implicit membership categorizations (shared understandings of particular social media platforms) or if they distance themselves from the screenshot (a focus on subjectivization). As we will see, many of the salient visual elements observed in the video frames can be interpreted as ‘bonding icons’, that is, symbols that embody particular values which people rally around (Stenglin, 2008; Tann, 2012; Zappavigna, 2014a; Zappavigna, 2014b). These dimensions of visual salience and affiliation will be unpacked further in the Results section.

3.2 Legitimation analysis

Legitimation refers to how discourses establish authority and credibility through ‘specific linguistic resources and configurations of linguistic resources’ (Van Leeuwen, 2007: 92) and via particular multimodal resources (Van Leeuwen, 2008). In this article, we consider both legitimation and de-legitimation as construed in language and images, drawing on the work of Van Leeuwen (2008). Broadly, (de)legitimation in Van Leeuwen’s original framework can be divided into four categories with further sub-categories:

Authorization: Legitimation according to custom (traditions or conformity), authority (personal or impersonal authority) or commendation (experts or role models).

Moral Evaluation: Legitimation of value systems via evaluation, abstraction, or comparison.

Rationalization: Legitimation via institutionalized social action or the knowledge society has constructed, in terms of theoretical legitimation (how knowledge is constructed in terms of the experiential, scientific, definition, explanation or prediction) or instrumental legitimation (focusing on institutionalized social action via the means, goals or effects of actions).

Mythopoesis: Legitimation conveyed through narratives and future projections. For example, via moral tales, cautionary tales, single determinations (that represent stories in a straightforward way) or overdeterminations (that represent stories via inversion or symbolisation).

A theoretical contribution of this article is in extending the legitimation framework via the impersonal authority sub-category. The YouTubers in the dataset employed screenshots, video clips and news articles as a form of evidence to legitimate their claims and, thus, we have refined the legitimation category of impersonal authority as follows, based on the empirical data from this study:

– Marketing: Credibility established via company logos.

– Laws, Rules and Regulations: This sub-category incorporates the original definition of impersonal authority by Van Leeuwen (2007), that is how references to laws, rules and regulations construct authority.

– Technological authority: Credibility established via technological means, e.g. screenshots, online articles and references to video links or Google searches.

Technological authority is also refined to include:

– Online Media: Legitimation formed by referring to social media or online news articles as evidence.

– Traditional Media: Legitimation formed by referring to newspapers and television as evidence.

– Technologies: Legitimation formed by referring to technologies (in the case of this study the only technologies referred to were aircraft, e.g. drones and UFOs). This can be expanded and refined in future studies to encompass new technologies.

These refinements to the legitimation framework are shown in Figure 1. It should be noted that these refinements were created in relation to the dataset for this study through an inductive methodology. Technological authority is deliberately used as a broad term so that this framework can be adapted and expanded for future studies as new forms and institutions in relation to technology emerge. In addition, we acknowledge that distinctions between online media and traditional media can be problematic. Whilst in our dataset these distinctions were clear, our framework (illustrated with the different bracket) allows multiple layers of legitimation to be simultaneously considered for future studies that might contain more complex examples of media. Practically, the significance of this refined legitimation framework is that technology (broadly conceived) is foregrounded as one of the key legitimation strategies in social media discourse, altering how researchers might develop methodologies and research questions, particularly in relation to information disorders and hate speech. This foregrounding of technological authority does not exist in the original legitimation framework, hence the need to foreground it in this study via the revised legitimation framework.

Figure 1.

Authorization sub-system of the legitimation framework (adapted from Van Leeuwen, 2007).

3.2 Coding strategy

The affiliation and legitimation analyses for this study are presented according to conventions shown in Figure 2. The video frame is presented at the top of the diagram, with the salient elements in the frame annotated with a red rectangle (which will be further explained in the analysis sections). Salient visual semiosis contributing to the realization of social bonds is shown via circular callouts, with corresponding visual legitimation strategies identified underneath the frame. A downward arrow symbol is used to indicate instantiated features. The transcript text (the YouTuber’s verbal text when the screenshot appears) is presented in a speech bubble. Underneath the transcript are the key bonds realized by the ideation-attitude couplings in the transcript (ideation underlined and attitude in bold). Underneath this are any legitimation strategies, also highlighted in the transcript text. In order to maintain ethical research standards, the faces of people who are not public figures have been anonymized with a black circle. This usually occurred in videos where the YouTuber speaks directly to the camera, as per the conventions of a vlog.

Figure 2.

Convention for presenting the multimodal analysis of affiliation and legitimation.

4. Results

The results of this study will be discussed according to the key types of screenshots and screen recordings that emerged from this qualitative research. Across the 30 videos, every example of a screenshot mentioned either verbally or visually was firstly identified. These were then categorized according to their verbal or visual attributes, and also taking into consideration the broader context of each screenshot and its interaction with other elements in the video. A summary of the types of screenshots, categorizations and frequencies across both case studies is shown in Tables 1 and 2. As these tables suggest, the most frequent screenshot visual structure was use of the split screen, followed by unaltered and emulated screenshots. The sections which follow explore each of the screenshot structures and types in terms of their key legitimation and affiliation patterns, with quotes also provided in verbatim.

Table 1.

Frequencies and categorizations of screenshots in datasets.

Screenshot visual structure	Momo Challenge dataset		Notre Dame Fire dataset
Screenshot visual structure	Frequency	Type	Frequency	Type
Unaltered screenshots	130 (16%)	Social media, online articles	490 (58%)	Social media, online articles
Split screen	254 (30%)	Social media, online articlessplit screen within split screen	120 (14%)	Social media, live stream recordings
Evidence collage	9 (1%)	Social media	2 (0-0.2%)	Social media
Emulated screenshot	127 (15%)	Social media, other technologies	0 (0%)	N/A
Annotated screenshot	22 (3%)	Social media	38 (5%)	Social media
Verbal reference to screenshots (number of times spoken about)	54	Social media, online articles, other technologies	82	Social media, online articles, live streams
Frames that did not contain screenshots	291 (35%)		191 (23%)
Total frames analysed	833		841
Total transcript words	14,426		18,306

Table 2.

Frequencies of verbal references to screenshots in datasets.

Screenshot structure	Momo Challenge dataset		Notre Dame Fire dataset
Screenshot structure	Frequency	Type	Frequency	Type
Verbal reference to screenshots (number of times spoken about)	54	Social media, online articles, other technologies	82	Social media, online articles, live streams
Total transcript words	14,426		18,306

4.1 Unaltered screenshots of online and traditional media

The most common forms of technological authority in the dataset were unaltered screenshots or screen recordings of social media posts or online articles being used as evidence to support claims. These screenshots would present the whole screen of a device including its interface elements. The interface elements increase the perceived veracity of the YouTube videos as they bond viewers who are familiar with the platforms represented and create a sense of transparency as the viewer can directly realize how the information was gathered via the act of taking a screenshot. An example is Figure 3 from the Notre Dame Fire case study, featuring a screen recording of an unfolding Facebook feed of reactions to the fire. This type of screen recording was used in three videos in the dataset to deceptively claim that Muslims are ‘evil’ and unable to integrate into Western society due to a lack of respect for Western culture and traditions. These claims reflect a white supremacist ideology, where Western values are centred as superior and Muslims are portrayed as a ‘racialized other’ by the YouTuber (Daniels, 2009). For instance, the salient image of the Notre Dame Cathedral in Figure 3 is a ‘bonding icon’ (Stenglin, 2008; Tann, 2012) representing Western history and culture. The representation of its destruction by fire and the accompanying repetition of ‘laugh reactions’ in the unfolding comment feed depicted in the screen recording is in this sense iconoclastic.

Figure 3.

Screenshots of social media posts from the Notre Dame Fire dataset.

The dynamism of the scrolling video also contributes to technological authority, creating the sense that the YouTuber is showing evidence in real-time that Muslims do not align with Western society. The screen recording scrolls through a list of laugh reactions with Arabic-sounding names (a description made by the YouTuber) that the YouTuber associates with Muslims. The negative propriety in the voice-over aligns with the values depicted in the visual content: ‘people’ (i.e. Muslims) are described as ‘reacting with smiley faces as Notre Dame burns’. The coupling of this negative judgement with people identified as Muslims realizes an ‘Evil Muslims bond’ that resonates throughout the video. As the annotation in Figure 3 suggests, legitimation of technological authority also occurs verbally, through particular semiotic entities such as social media posts and screenshots (‘the video’; ‘the same video I tweeted’; ‘this screenshot from the original live at Notre Dame video’). These entities are positioned confirming the YouTuber’s accusation that Muslims were reacting with impropriety to the Notre Dame Cathedral on fire. Classifying the video as ‘original live’ also emphasizes the legitimacy of the screenshot through positive appreciation invoking veracity. Thus, in this example, we see an alignment between the values depicted both verbally and visually, and a coordination of visual and verbal legitimation strategies centring on technological authority.

In the Momo Challenge dataset, screenshots of social media posts also served a legitimation function, drawing on technological authority as evidence that the Momo Challenge is a real, ongoing and tangible threat. Dynamic screen recordings of social media feeds are again used for legitimation (see Figure 4). Visually, what is striking about these screen recordings is the salient image of ‘Momo’, who is a recognizably threatening figure associated with negative emotions such as fear. The voice-over negatively appreciates the challenge as ‘nothing new’ ([ideation: Momo Challenge / attitude: negative valuation], Figure 4), invoking negative judgement of people who do not recognize its ongoing history. This is also part of enacting a ‘Dangerous Momo’ bond that is shared by people spreading the misinformation. The legitimation strategies employed here again employ technological authority, with the YouTuber referring to practices of using technology (‘scrolled through Facebook, Instagram’). The visual salience of the Momo image in the screen recording strengthens the powerful bond that the Momo Challenge is a real and dangerous phenomenon.

Figure 4.

Screenshots of social media posts from the Momo Challenge dataset.

In terms of the source of information, screenshots of online articles were also employed in the dataset as evidence to support deceptive and conspiratorial claims. Typically, these screenshots were cropped in order to focus upon the particular headline of the article, often editing the screenshots so that several could be shown at once, mimicking the material characteristics of newspaper cut-outs. This type of editing forms social bonding around an interest in forensic media production and consumption. For instance, the Notre Dame Fire dataset included a video featuring an article from the Daily Star website, with the salient headline ‘Mystery 200-year-old letter revealed World War 3 plans – and final battle against Islam’ (Figure 5). This headline promotes an ‘Evil Islam bond’ via the coupling [ideation: Islam / invoked negative propriety] because Islam is depicted as an opponent in a battle, thus evoking a negative moral evaluation of Islam. The visual salience of the Daily Star logo and the structure of a news website adds technological authority by co-opting the authority of traditional media. Whilst, in the voiceover, the paper is initially negatively evaluated as something the YouTuber would not usually buy, its claims are evaluated positively, contributing to a ‘Truthful Story bond’ that supports the co-occurring ‘Evil Islam’ bond.

Figure 5.

Online articles from the Notre Dame Fire dataset.

In the Momo Challenge case study, screenshots of online articles are used as evidence for the claim that the Challenge has resulted in the death of children. These articles are often from local media or non-mainstream media outlets or are cropped in such a way as to only highlight the title without further context. In Figure 6, Momo is again the salient image drawing the viewer’s attention. The voice-over realizes a ‘Dangerous Momo bond’ by negatively valuing the challenge, linking it to the suicide of a 12-year-old girl and 16-year-old boy. The source entity, ‘local media reported’, evokes impersonal authority. The lack of detail, in terms of the specific local media organization, mirrors the scant detail in the screenshot of the online article. This is an example of the way that these sorts of screenshots can deceptively distort news stories.

Figure 6.

Online articles from the Momo Challenge dataset.

4.2. Enacting technological authority with split screen videos

Screenshots were frequently incorporated into split screen videos, adding additional layers of technological authority to the YouTuber’s video, with the screenshot referring to other media and also acting as a quasi-indexical snapshot of a real screen. These videos were typically structured with a smaller video frame in a corner (usually of the YouTuber) and the other video frame taking up most of the visual field (usually a screenshot or recording). An example is Figure 7 which incorporates a screenshot of the YouTuber’s video about Macron (that the viewer is directed to watch later), with the YouTuber represented in a smaller shot in the foreground of the bottom left corner.

Figure 7.

Split screen video from the Notre Dame Fire dataset.

The screenshot functions to emphasize two key bonds construed by the linguistic couplings shown in Figure 7: an ‘Evil Macron bond’, realized in the heading ‘The Shocking Truth about Macron’, and a ‘Truthful Information bond’, realized by the title of the YouTube video‘The Shocking Truth About Emmanuel Macron: What You Need to Know’. The voice-over features multiple negative evaluations targeted at Macron, contributing to the ‘Evil Macron bond’. The direction to ‘check out my video’ supports the visual content, legitimizing the screenshot as a source of evidence. The screenshot also acts as a catalyst for other negative evaluations, with the voiceover promoting these bonds. For example, appreciations invoking positive veracity (e.g. ‘documented provable facts’) are used to heighten the YouTuber’s apparent credibility, in contrast with Macron’s lack of credibility (‘this man cannot be trusted’). The choice of the vocative, ‘folks’, is an attempt to convoke the ambient audience around these values by constructing the ambient audience as peers.

Split screen screenshots had a similar legitimating function in the Momo Challenge dataset. This dataset also contained examples of an inverse split screen structure, with the YouTuber in the dominant visual position and the screenshot of a supposed interaction with Momo in the top right corner (Figure 8). Whilst the YouTuber and their reaction to the content in the screenshot takes prominence, the screenshot remains a bonding icon promoting a ‘Truthful Information bond’. In the voice-over, Momo is the target of negative propriety for insulting and manipulating people, and images (most likely of Momo) are the target of negative reaction. Even though the YouTuber claims these images cannot be shown to this audience, pointing to their existence itself serves as a form of technological authority, with the self-censorship heightening their supposed significance. The additional legitimation strategy of mythopoesis is also realized when the YouTuber begins a recount of what will occur if ‘you can get Momo to interact with you’. Considering this example together with the previous example from the Notre Dame dataset, the split screen appears to have two main functions: it either privileges the screenshot as an essential piece of information or privileges the YouTuber speaking as a legitimate voice of knowledge in the video with the screenshot adopting a more auxiliary evidentiary role.

Figure 8.

Split screen video from the Momo Challenge dataset.

4.3 Screenshots as evidence collages

Across both datasets, there were examples of ‘evidence collages’, that is, ‘image files that aggregate positive evidence’ employed as direct proof of an issue/fact (in contrast with circumstantial proof which refers to evidence not drawn from direct observation) (Krafft and Donovan, 2020: 205). These collages incorporated collections of multiple screenshots displayed simultaneously on the screen. In the Notre Dame Fire dataset, evidence collages typically consisted of multiple visually adjacent tweets used to suggest that multiple sources had provided the same evidence. For example, in Figure 9, the two adjacent tweets both state that churches have been attacked and include images of damaged churches. These images function as bonding icons, evoking negative interpersonal meanings about the destruction of French culture, contributing to a ‘Destroyed Western History bond’ because the Notre Dame Cathedral and churches in France have been co-opted by far-right YouTubers as a symbol of Western cultural superiority. The reference to ‘multicultural France’ links this to anti-immigration stances. The inclusion of multiple screenshots as sources is also a legitimation strategy. The voiceover’s reference to ‘links’ also contributes to a ‘Truthful Information bond’ via technological authority. The implication is that, because there is more than one link to ‘information’, these must be legitimate sources. This also serves as a rationalization in terms of legitimation, solidified by the instruction to ‘double check it’.

Figure 9.

Evidence collage from the Notre Dame Fire dataset.

In the Momo Challenge dataset, mobile phone screenshots in particular, were used as evidence that the YouTuber had engaged with Momo. For example, in Figure 10 three screenshots of Momo (two of calls and one of an exchange of messages with Momo) are presented side-by-side, documenting people’s supposed interactions with Momo. Again, the succession of images creates a stronger sense of technological authority – that there are multiple instances where people have interacted with Momo. The image of Momo is a salient feature, placed in the centre of the first two screenshots. The voice-over for these images negatively evaluates the Momo challenge as a ‘massive disturbance’, tabling a ‘Dangerous Momo bond’. There is also some technological authority, in the reference to ‘social media’ and a ‘messaging app’. Again, the visual content reinforces the values legitimated in the voice over.

Figure 10.

Evidence collage from the Momo Challenge dataset.

4.4 Emulated screenshots

Emulated screenshots encompass screenshots that were not actually taken by someone but are instead imitations of screenshots that have been heavily edited or created entirely from scratch. These functioned to provide technological authority in scenarios where the YouTuber was unable to access or to publish primary source texts. Examples of emulated screenshots were only present in the Momo Challenge dataset. In this dataset there are two striking examples of different types of emulated screenshots. The first example consists of an emulation of the layout of a tweet in the background (Figure 11). The foreground initially presented as a blank tweet template that was gradually populated with images of changing screenshots and PowerPoint-like animations of changing text. The second example consists of screenshots that have been unrealistically superimposed over a typical stock photo (Figure 12). In this example, the screenshot itself is not realistic but rather illustrative. The words ‘suicidal thoughts’ are repeated multiple times in a font that is not associated with mobile phone texting interfaces.

Figure 11.

Emulated screenshot 1 from the Momo Challenge dataset.

Figure 12.

Emulated screenshot 2 from the Momo Challenge dataset.

The video voice-overs provide further insight into the kinds of meanings being made with these emulated screenshots. The Momo challenge is negatively evaluated, realizing an ‘Evil Momo challenge bond’ (Figure 11). This matches the visual content with the salient image of Momo signalling danger. The reference to leaving behind a ‘video on her phone’ also aligns with the emulated screenshot trying to recreate, as a form of technological authority, that evidence. However, there are also some additional meanings formed from the video voice-over. Firstly, the Momo challenge in the US is the target of invoked positive reaction (because no deaths have currently been reported in the US) forming a ‘Safe US bond’. Additionally, there is also some commendation, with the reference to law enforcement agencies legitimating the claim that Momo is a danger. Overall, the function of the emulated screenshot appears to be to illustrate findings that cannot be shown as positive proof.

In the second example, the voice-over negatively evaluates parents for not watching over their children, tabling a ‘Careless Parents bond’ (Figure 12). The Momo Challenge is also negatively evaluated for being associated with the child’s death, contributing to an ‘Evil Momo bond’. The claim by the voiceover that the parents ‘read text messages from his classmates exchanging suicidal thoughts’ again presents the semiotic entity ‘text messages’ as a legitimate source via the legitimation strategy of technological authority. Since this YouTuber does not actually have access to the original text messages, the emulated screenshot, with the repetition of the phrase ‘suicidal thoughts’, is instead employed to bolster their claims. Visually, what is also striking about this emulated screenshot is that the device itself and the human hand holding it are included in the frame. This presents the screenshot as distant in relation to the human hand and therefore has different meaning-making strategies to a typical screenshot automatically generated by a computer.

4.5 Annotated screenshots

There were examples where screenshots and screen recordings were edited by the YouTuber in the Notre Dame Fire dataset in order to cast doubt on the recordings and help align the footage with the conspiracy theory that the YouTuber was attempting to promulgate. In one example from the Notre Dame dataset, the videos consist of recorded footage from a CBSN live stream that has been annotated by the YouTuber (Figure 13). These annotations consisted of hand-drawn red arrows pointing to the Cathedral, with commentary questioning visual features of the image (e.g. ‘One fireman with one hose?’) and casting doubt on the reporting. The live stream is also used as technological authority, with the YouTuber stating that they are noticing details in real-time about the footage that the news organization is not picking up on. In this example, the officers and correspondents are negatively evaluated, enacting ‘Lying Authorities’ and ‘Lying Media’ bonds. In terms of legitimation strategies, the officers and correspondents are deauthorized, based on this negative evaluation. A technological authority legitimation strategy is also present as the ‘CBS clip’ is used as evidence that these authorities and journalists show suspicious behaviour (‘walking like nothing happened’). Overall, annotated screenshots have the rhetorical function of persuading the viewer to see the screenshot from the YouTuber’s perspective, creating new meanings via the act of annotating the primary text as it unfolds.

Figure 13.

Annotated screenshot from the Notre Dame Fire dataset.

4.6 Absence of visual screenshots

Across the entire dataset, three out of the 30 videos did not feature visual screenshots but invoked the presence of screens in the linguistic verbal text. These three videos are from the Notre Dame Fire dataset and instead employed the vlog macrogenre (i.e. a YouTuber directly speaking to the camera). The YouTuber refers to content present on a screen (an article discussing churches being destroyed across all of France) as a form of technological authority (Figure 14). The YouTuber expresses negative reaction to the fire, contributing to a ‘Suspicious Fire bond’. In terms of legitimation strategies, the YouTuber refers to an ‘article’ and, whilst there is no corresponding visual content, they are simultaneously looking at a screen (presumably containing the article).

Figure 14.

Absence of visual screenshots: Example 1 from the Notre Dame Fire dataset.

In another video employing technological authority, YouTube and its recommendation algorithm are evaluated negatively for banning people trying to upload videos spreading misinformation about a connection between the Notre Dame Fire and September 11. This concern stemmed from YouTube’s recommendation algorithm initially recommending people to look at videos of September 11 after watching videos about the Notre Dame Cathedral. In this case, the YouTuber has used this knowledge of the recommendation algorithm (‘an algorithm that will shut you down . . .’) as technological authority supporting their insinuation that the fire is suspicious.

5. Discussion And Conclusion

YouTube videos can spread conspiratorial and hateful content in sophisticated ways, and screenshots contribute to the believability and virality of misinformation and disinformation. This study has demonstrated how a social semiotic approach can illuminate the affiliative and legitimation function of screenshots in deceptive YouTube videos. The addition of technological authority to the legitimation framework highlights the importance of technology as evidence in social media discourse. Through its focus on the affiliation framework, this study also connects to the broader research currently being conducted on screenshots that encourages researchers to look at the screenshot itself as a media object or a kind of document, rather than looking through the screenshot and ignoring the now societal importance of screenshots in our everyday lives (Frosh, 2018).

As part of the inductively developed methodological approach of this research, six different ways screenshots can be visually structured have been conceptualized: evidence collages, split screens, emulated or annotated screenshots, and unaltered screenshots. Screenshots can also be referenced in the linguistic verbal text. These different options can all function to legitimate certain social bonds via technological authority and to spread misinformation. Whilst there were strong correlations between the legitimation strategies used visually and the legitimation strategies used verbally, the verbal content often added additional social bonds, contextualizing the significance of the visual content. For example, in the Momo Challenge case study, the verbal text contributed to extra evaluations and therefore social bonds that were critical of parents, and in the Notre Dame Fire case study, the verbal text contributed to extra social bonds that cast suspicion on particular media organizations. Thus, it was important to analyse both the verbal and visual content in order to understand these vast arrays of meanings.

Overall, the empirical findings from the two case studies revealed intriguing insights about how legitimation strategies work both visually and linguistically. With the Momo Challenge dataset, there was a focus on news stories and experts, using other YouTube videos and social media content as evidence for making claims about the Momo Challenge, and spreading moral panics about the Momo Challenge by providing warnings to parents and criticizing YouTube. While incorporating expert opinion and using evidence to back-up claims is considered good journalistic practice, the videos analysed often incorporated these practices deceptively, e.g. manipulating content to make it appear as a trusted source. With the Notre Dame Fire dataset, the analysis of the transcripts and visual content revealed how affiliation and legitimation strategies work in tandem through social bonding and shared ideational targets in verbiage and visual content. The social bonds adopted by the YouTuber were used to legitimize or delegitimize people and ideas. Across this particular dataset, technological authority was used as a strategy to justify conspiratorial discourse about authorities, with the Notre Dame Cathedral itself the key bonding icon, manipulated by screenshots and video clips to legitimize the sense of sharing evidence in real time.

In terms of future research, the issues raised in this article might be expanded to consider different digital artefacts and different case studies across the broad spectrum of false information, ranging from misinformation to disinformation. This research could also be complemented with further studies that interview users who engage with screenshots and question their motives for creating and sharing screenshots, particularly in breaking news contexts. Jaynes (2020) provides an example of how interviewing teenagers who engage with screenshots reveals rich findings. These insights could reveal further details about how social bonding occurs in relation to screenshots and the changing cultural meanings of screenshots across time. Whilst this current study has remained explorative in terms of focusing on new methodological insights, rather than providing frequency information regarding the affiliation and legitimation strategies explored across large datasets, future studies might attempt to quantify these patterns across visual and verbal modes and consider the extent to which the visual and the verbal meanings coordinate with each other.

Footnotes

Data Disclosure Statement

The data that support the findings of this study are available from the corresponding author,Olivia Inwood,upon reasonable request.

Declaration Of Conflicting Interests

During some of this research,Olivia Inwood received an Australian Government Research Training Program Scholarship and additional funding support from the Commonwealth of Australia.

ORCID iDS

Olivia Inwood

Michele Zappavigna

Biographical Notes

OLIVIA INWOOD is an Academic Literacy Coordinator at Western Sydney University. Her PhD was officially conferred by UNSW Sydney in May 2023. Her thesis developed a combined systemic functional linguistics (SFL) and multimodal discourse analysis (MDA) framework to understand how online communities on YouTube,via language and visual resources,construct and legitimate social bonds that propagate information disorders. Her research articles written with Associate Professor Michele Zappavigna have been recently published in Discourse & Communication,Social Media + Society,The Communication Review and Social Semiotics .

Address : Level 5,Western Sydney University Penrith Campus Library,Building T,Second Ave,Kingswood,2747,Australia [email: o.inwood@westernsydney.edu.au ]

MICHELE ZAPPAVIGNA is Associate Professor in the School of Arts and Media at the University of New South Wales. Her major research interest is exploring ambient affiliation in the discourse of social media using social semiotic,multimodal and corpus-based methods. She is a co-editor of the journal Visual Communication . Key books include Searchable Talk: Hashtags and Social Media Metadiscourse (Bloomsbury Academic,2018) and Discourse of Twitter and Social Media (Continuum,2012). Recent co-authored books include Researching the Language of Social Media (Routledge,2014,2022) and Modelling Paralanguage Using Systemic Functional Semiotics (Bloomsbury Academic,2021).

Address : School of the Arts and Media,University of New South Wales (UNSW),Robert Webster Building,Sydney,NSW,2052,Australia. [email: m.zappavigna@unsw.edu.au ]

References

Allen

(2016) Representing computer-aided design: Screenshots and the interactive computer circa 1960. Perspectives on Science 24(6): 637–668.

Bigman

, et al. (2022) ‘There will be screen caps’: The role of digital documentation and platform collapse in propagation and visibility of racial discourses. Information, Communication & Society 10.1080/1369118X.2022.2041698, 1–18.

Corry

(2021) Screenshot, save, share, shame: Making sense of new media through screenshots and public shame. First Monday 26(4). DOI: 10.5210/fm.v26i4.11649.

Cramer

Jenkins

Sang

(2022). What’s behind that screenshot? Digital windows and capturing data on screen. Convergence. DOI: 10.1177/13548565221089211.

Daniels

(2009) Cyber Racism: White Supremacy Online and the New Attack on Civil Rights. Lanham, MD: Rowman & Littlefield.

Dunaway

(2021) Polarisation and misinformation. In: Tumber

Waisbord

(eds) The Routledge Companion to Media Disinformation and Populism. London: Routledge, 131–141.

Frosh

(2018) The Poetics of Digital Media. Hoboken, NJ: John Wiley & Sons.

Halliday

MAK

(1978) Language as Social Semiotic: The Social Interpretation of Language and Meaning. London: Hodder Arnold.

Igwebuike

Chimuanya

(2020) Legitimating falsehood in social media: A discourse analysis of political fake news. Discourse & Communication 15(1): 42–58.

10.

Inwood

(2021) White Supremacists Deceptively Using Screenshots As Evidence: A Social Semiotic Approach To Analysing Conspiratorial YouTube Videos. Paper presented at AoIR 2021: The 22nd Annual Conference of the Association of Internet Researchers. Virtual Event: AoIR. Retrieved from http://spir.aoir.org.

11.

Inwood

Zappavigna

(2021) Ambient affiliation, deceptive communication, and moral panic: Negotiating social bonds in a YouTube internet hoax. Discourse & Communication 15(3): 281–307.

12.

Inwood

Zappavigna

(2022a) The ID2020 Conspiracy Theory in YouTube Video Comments during COVID-19: Bonding around Religious, Political, and Technological Discourses. In: Demata

Zorzi

Zottola

(eds) Conspiracy Theory Discourses. Amsterdam: John Benjamins, 241–270.

13.

Inwood

Zappavigna

(2022b) A Systemic Functional Linguistics Approach to Analysing White Supremacist and Conspiratorial Discourse on YouTube. The Communication Review. 25 (3-4), 204–234.

14.

Jaynes

(2020) The social life of screenshots: The power of visibility in teen friendship groups. New Media & Society 22(8): 1378–1393.

15.

Jenkins

Cramer

(2022). Capturing injustice: The screenshot as a tool for sousveillance. Howard Journal of Communications. DOI: 10.1080/10646175.2022.2032884

16.

Knight

(2013) Evaluating experience in funny ways: How friends bond through conversational humour. Text & Talk 33(4/5): 553–574.

17.

Krafft

Donovan

(2020) Disinformation by design: The use of evidence collages and platform filtering in a media manipulation campaign. Political Communication 37(2): 194–214.

18.

Kress

Van Leeuwen

(2006) Reading Images: The Grammar of Visual Design, 2nd edn. London: Routledge.

19.

Martin

White

PRR

(2005) The Language of Evaluation: Appraisal in English. New York, NY: Palgrave Macmillan.

20.

Moore

(2014). Screenshots as virtual photography: Cybernetics, remediation, and affect. In Arthur

Paul

Katherine

Bode

(Eds), Advancing digital humanities: research, methods, theories (pp. 141–160). London: Palgrave Macmillan UK.

21.

Papacharissi

(2010) A Networked Self: Identity, Community, and Culture on Social Network Sites. London: Routledge.

22.

Rieder

(2015) YouTube Data Tools (Version 1.23) [Software]. Available at: https://tools.digitalmethods.net/netvizz/youtube/ (accessed 14 June 2022)

23.

Ross

Rivers

(2017) Digital cultures of political participation: Internet memes and the discursive delegitimization of the 2016 US Presidential candidates. Discourse, Context & Media 16: 1–11.

24.

Stenglin

(2008) Interpersonal meaning in 3D space: How a bonding icon gets its ‘charge’. In: Unsworth

(ed.) Multimodal Semiotics: Functional Analysis in Contexts of Education. London: Continuum, 50–66.

25.

Švelch

(2021) Redefining screenshots: Toward critical literacy of screen capture practices. Convergence 27(2): 554–569.

26.

Tann

(2012) The language of identity discourse: Introducing a systemic functional framework for iconography. Linguistics & the Human Sciences 8(3): 361–391.

27.

Van Leeuwen

(2007) Legitimation in discourse and communication. Discourse & Communication 1(1): 91–112.

28.

Van Leeuwen

(2008) Discourse and Practice: New Tools for Critical Discourse Analysis. Oxford: Oxford University Press.

29.

Wardle

Derakhshan

(2017) Information Disorder: Toward an Interdisciplinary Framework for Research and Policy Making. Report for Council of Europe, Strasbourg, October.

30.

Zappavigna

(2014a) Coffeetweets: Bonding around the bean on Twitter. In: Seargeant

Tagg

(eds) The language of social media: Communication and community on the Internet. London: Palgrave, pp.139–160.

31.

Zappavigna

(2014b) Enjoy your snags Australia… oh and the voting thing too# ausvotes# auspol: Iconisation and affiliation in electoral microblogging. Global Media Journal: Australian Edition 8(2).

32.

Zappavigna

(2018) Searchable Talk and Social Media Metadiscourse. London: Bloomsbury Publishing.

The legitimation of screenshots as visual evidence in social media: YouTube videos spreading misinformation and disinformation

Abstract

Keywords

1. Introduction: Screenshots And Social Semiotics

2. Dataset And Sampling Strategy

3. Method

3.1 Affiliation and social bonding analysis

3.2 Legitimation analysis

3.2 Coding strategy

4. Results

4.1 Unaltered screenshots of online and traditional media

4.2. Enacting technological authority with split screen videos

4.3 Screenshots as evidence collages

4.4 Emulated screenshots

4.5 Annotated screenshots

4.6 Absence of visual screenshots

5. Discussion And Conclusion

Footnotes

Data Disclosure Statement

Declaration Of Conflicting Interests

ORCID iDS

Biographical Notes

References