Abstract
Introduction
Human migration is a globally significant phenomenon. Population dynamics continue to shift as a consequence of conflict, persecution, poverty and climate change among other contributing factors. People displaced by these circumstances are among the most vulnerable people on the planet. Yet, public perceptions of and attitudes toward migration are often hostile, which creates a space for equally hostile government policies. To a large extent, it is mediated experience in representations of refugees and migrants that is responsible for shaping public knowledge and opinion in connection with migration. In an increasingly multimodal world, it is visual as well as verbal representations that are of critical influence, including visual representations generated through Artificial Intelligence (AI). AI-generated images not only reflect our multimodal world and therefore provide a lens through which it can be examined, they increasingly come to populate it and are thus themselves important sites of critical investigation. This study analyses AI-generated images of migration from the perspective of Cognitive Critical Discourse Analysis, focussing specifically on the motion event inherent to the migration process. Despite the social and methodological significance of AI-generated images, the study is the first to apply Critical Discourse Analysis to images of migration produced by AI.
Background
Images published on and shared over the internet increasingly define our knowledge and understanding of social and political realities. Indeed, according to Stocchetti (2011: 33), images are now ‘the dominant form of political communication’. Millions of images are viewed and downloaded daily via search engines like Google (Zhang and Rui, 2013). The images returned often reflect and reinforce pre-existing power structures and prejudices (Guilbeault et al., 2024; Noble, 2018). Online news media, compared to traditional print formats, provide extensive amounts of visual content. Caple and Knox (2015) found that almost half of English-language newspaper websites around the world include dedicated multimedia sections with the most common form being photo galleries. Images also generate traffic to and influence social media engagement with news articles. For example, links to online news articles that contain images are more likely to be clicked and shared (Collier et al., 2021; Li and Xie, 2020). Eye-tracking studies show that the images contained in news articles themselves are among the first entry-points to the text and receive a disproportionate amount of attention compared to written material (Bucher and Schumacher, 2006; Holsanova and Nord, 2010; Leckner, 2012; Quinn et al., 2007). As an example of a more general ‘picture superiority effect’ (Hockley, 2008), images in news texts are also shown to enhance recall (Graber, 1990). Besides such cognitive processing and engagement effects, images achieve important framing effects and can even lead to falsely recalling information not present in news stories (Garry et al., 2007). Images produced as part of digital news content provide a visual narrative of political topics and events and are especially influential in shaping public knowledge and opinion and thus in driving policy directions. For example, the images included as part of a news story affect evaluations of social actors and actions (Arpan et al., 2006) and influence behavioural intentions (Geise et al., 2021; Powell et al., 2015). With respect to immigration, negative portrayals of migrants increase ingroup favouritism and outgroup hostility (Conzo et al., 2021; Schemer, 2012; Schemer and Meltzer, 2019; Scherman et al., 2022). When linked with crime, a negative framing of immigration in the news influences voting behaviour in favour of anti-immigrant parties (Burscher et al., 2015). Conversely, exposure to more positive coverage, including instances of successful intergroup contact, results in more positive attitudes toward refugees and migrants and decreased support for restrictive border and security policies (Djourelova, 2023; Joyce and Harwood, 2014).
All of this suggests that the visual representation of refugees and migrants online is highly significant for public perceptions of and attitudes toward migration and that certain patterns of representation, if found, will contribute to perpetuating and sustaining stereotypes, stoking prejudice and fear, and legitimating hostile immigration policies. Critical Discourse Analysis (CDA) has developed a set of tools, based in linguistics but extended to visual and multimodal communication, that enables the identification and quantification of visual elements implicated in discursive constructions of power and inequality (Machin, 2013; van Leeuwen, 2014).
In connection with migration, research in CDA has uncovered several forms of visual representation which, recurrent in online news, construct refugees and migrants as a dangerous and alien ‘Other’ (Batziou, 2011; Bleiker et al., 2013; Catalano and Musolff, 2019; Farris and Silber Mohamed, 2018; Martínez Lirola, 2016, 2022; Martínez Lirola and Zammit, 2017; Romano and Dolores Porto, 2021; Wilmott, 2017). For example, refugees and migrants tend to be depicted in large groups rather than small groups (Martínez Lirola, 2016; Wilmott, 2017), which results in dehumanising effects as audiences are less likely to ascribe to them human emotions like compassion, guilt and tenderness (Azevedo et al., 2021) as well as increased support for anti-immigration policies (Azevedo et al., 2021; Madrigal and Soroka, 2023). Refugees and migrants are also frequently represented in explicitly dehumanising contexts (López, 2020; Martínez Lirola, 2022), where they appear in ways comparable to wild animals, or in contexts of crime/security, where they become criminalised (Catalano and Musolff, 2019; Farris and Silber Mohamed, 2018; Martínez Lirola, 2016; Wilmott, 2017). Images also tend to capture refugees and migrants somewhere along the way in their migratory journey rather than depicting the circumstances they are escaping or showing them settled into life in a new place (Romano and Dolores Porto, 2021). Such patterns of visual representation are consistent with verbal forms of representation where refugees and migrants are described in metaphorical terms as animals/diseases, natural disasters or invading armies (Abid et al., 2017; Charteris-Black, 2006; Cisneros, 2008; Dolores Porto, 2022; El Refaie, 2001; Hart, 2021; Musolff, 2015; Santa Ana, 1999, 2002) and these metaphors likewise achieve framing effects in eliciting negative emotions and legitimating anti-immigration policies (Chkhaidze et al., 2021; Jiminez et al., 2021; Marshall and Shapiro, 2018; Utych, 2018). Gabrieletos and Baker (2008) used corpus linguistic methods to show that the label ‘refugee’
Collocation is ‘the phenomena of certain words frequently occurring next to or near each other’ in texts (Baker, 2006: 96). As a result of this statistical relationship, the concepts lexicalised by the collocate of a word come to form part of its semantic profile (Stubbs, 1995) and may be primed even on occasions when the collocate is not actually present (Stubbs, 1996, 2002). Thus, where the words
A similar idea developed in cognitive linguistics is that of
Spoken and written language nearly always occur in multimodal situations. From a usage-based constructionist perspective, it therefore ‘appears very likely that the mental grammars of speakers will also contain a considerable number of multimodal constructions’ (Hoffman, 2021: 82). Usage-based traditions like cognitive grammar (Langacker, 2008) and construction grammar (Goldberg, 2003, 2006), as part of a more general ‘multimodal turn’ in linguistics (Cohn and Schilperoord, 2024; Stöckl, 2020), have thus been extended to incorporate within their theoretical architectures non-verbal forms like gesture and image.
Goldberg (2006: 5) defines a linguistic construction as a ‘learned pairing of form with semantic or discourse functions’ where forms include morphemes or words, idioms, and partially lexically filled as well as fully general phrasal patterns. Constructions are stored in the minds of language users as cognitive units which together make up the ‘constructicon’. A multimodal construction, accordingly, represents a routinised form-meaning pairing whose form consists of elements belonging to more than one semiotic mode. That is, multimodal constructions consist of intersemiotic forms that conventionally combine in the expression of meaning and which together with that meaning are entrenched as stored cognitive units in the minds of language users. For Ziem (2017), there are two types of multimodal construction. One is ‘inherently’ multimodal comprising multimodal forms which obligatorily co-occur, as in deictic expressions that require a kinetic element in order to determine their referent. The other consists of two or more semiotic forms which co-occur with ‘sufficient frequency’ as to have become conventionalised and thus entrenched, where entrenchment is a correlate of conventionalisation (Lehmann, 2024).
Multimodal constructions are therefore distinct from incidental co-occurrences with the question of when a multimodal construct achieves constructional status being a largely quantitative one dependent on frequency of co-occurrence within a speech community. In so far as they ‘provide insights into the conventionalization of a construction’ (Lehmann, 2024: 413), corpus frequencies have thus become paramount to claims about multimodal constructions. Indeed, corpus data has been the primary source of evidence for research investigating multimodal constructions. No specific frequency threshold is set, however. This is because entrenchment is a gradual phenomenon and, as Uhrig (2022) points out, we are therefore better thinking in terms of degree of entrenchment than of a binary opposition between constructional and non-constructional status. For example, in their respective corpora, Schoonjans (2018) showed that the German particle
Crucially, constructions may exist in the language at large or may be particular to specific discourses and genres (Antonopoulou and Nikiforidou, 2011; Groom, 2019). While most of the research addressing multimodal constructions has been focussed on combinations of verbal and kinetic elements in spoken language, there is some research investigating the visual and audio-visual correlates of linguistic forms, including specifically in the context of news communication (Hart, 2024; Hart and Marmol Queralto, 2021; Steen and Turner, 2013). For example, Steen and Turner (2013) used the NewsScape corpus to show that when narrating a news event, the past-tense + proximal deictic construction (as in ‘was/were + now’) has a tendency to occur alongside a zooming in on the entity whose experiences are being recounted. Hart (2024) also used the NewsScape corpus to investigate the images that regularly co-occur with four constructions describing motion events in TV news coverage of migration. The four constructions crossed referential nominal with grammatical aspect ([refugees/migrants have VERBed] / [refugees/migrants are VERBing]) and focussed on the top six verbs expressing motion in the corpus. Among various patterns observed, the most common form of visual representation found to occur with these constructions depicted large groups of refugees and migrants moving on foot over land somewhere along the migratory journey. An important point to note here is that the imagery that figures in a multimodal construction is not a specific image but is a more abstract visual form representing features common across images. As with collocation, aspects of meaning contributed by the visual component of a multimodal construction become part of the meaning of the verbal component alone and are likely to be evoked even when the visual component is not itself co-instantiated in text.
Investigations into multimodal constructions are based largely on correlational analyses of intersemiotic forms. However, since multimodal data annotation is extremely time-consuming, this means analyses are necessarily based on a relatively limited number of data-points. The argument I wish to make here is that AI provides a methodological shortcut to a dataset that is infinitely vast by comparison and that, by aggregating millions of images across the internet to generate images which typify those most recurrently associated with a given verbal form, AI can be used as an effective tool in establishing the visual components of multimodal constructions.
I do not claim to have a computational understanding of the way AI text-to-image generation works but instead adopt the position of a non-technical, critical user (Putland et al., 2023). AI-image generators are given textual prompts by a human user and then, relying on trained machine learning models, produce images that match the prompt. According to Canva, a free-to-use graphic design platform with a built-in text-to-image generator: to create AI-generated images, the machine learning model scans millions of images across the internet along with the text associated with them. The algorithms spot trends in the images and text and eventually begin to guess which image and text fit together. Once the model can predict what an image should look like from a given text, it creates entirely new images.
1
AI-generated images will naturally reflect the social identities, values and stereotypes enshrined in the images that the models are trained on and can therefore be critically analysed to reveal the ideologies and prejudices propagated over the internet.
At the same time, those ideologies and prejudices will be reinforced and amplified as the general image pool comes to be populated more and more by AI-generated images. AI-image generation is therefore described as a ‘semiotic technology’, that is, ‘a technology for meaning-making that is deeply inscribed with certain sets of social norms, values and ideologies’ (Westberg and Kvåle, 2024: 2). From this perspective, AI-generated images constitute, in their own right, ‘important, but currently understudied, social texts to attend to in relation to complex social phenomena’ (Putland et al., 2023: 2). However, since AI text-to-image generation is relatively new technology, research investigating it is still limited, especially within discourse studies. A few recent exceptions are nevertheless to be found. For example, across a broad range of prompts, Bianchi et al. (2023) found that AI generates images which reinforce racial and gendered stereotypes, promote whiteness as an ideal, and reproduce American norms. Prompts mentioning social groups (e.g. by race or nationality) produce images ‘that tie specific groups to negative or taboo associations like malnourishment, poverty, and subordination’ (p. 1495). These biases are so robust that they are reproduced even when they are explicitly countered in the prompts. Putland et al. (2023) examined AI-images generated in connection with dementia. They found that images recycle previously existing and prominent discourses surrounding the syndrome by maintaining a biomedical framing, presenting narratives focussed on loss and dementia as a ‘living death’, and displaying a distinct lack of diversity characterised by an over-representation of older, light-skinned individuals. Westberg and Kvåle (2024) analysed AI-generated images representing teenagers. Although the images generated presented a diverse range of ethnicities, they didnot appear to mix between ethnic provenance. Instead, through juxtaposition of teenagers with contrasting biological attributes (e.g. skin colour, hair quality), the images reinforced exclusive ethnic categories. Other dimensions of diversity werenot visually denotated. For example, there wasconformity to body normativity as well as with respect to cultural attributes like clothing.
Putland et al. (2023: 5) and Westberg and Kvåle (2024: 2) see the social functions of AI-generated images as continuing from stock images but providing users with a greater ability to generate content for themselves. Machin (2004) addresses the increasing role of image banks like Getty Images in defining the visual language of various media, including digital news media. He shows that the images contained in such banks provide users with a pre-structured world that is organised into ideologically constrained categories that are consistent with the values of consumerism and globalisation at the expense of social security and mobility. AI and image banks now exist in a symbiotic relationship with image banks providing a large proportion of the data that AI models are trained on and image banks having a large stock of AI-generated images.
This study both uses and targets AI-generated images to critically analyse patterns in the visual representation of refugees and migrants and to establish the imagery that figures as part of a specific multimodal construction.
Method
Analytical framework: Cognitive CDA
CDA has successfully extended frameworks developed in linguistics to account for patterns of representation in non-linguistic modes of communication, including images, and their role in reproducing social identities and unequal relations of power (Kress and van Leeuwen, 2006; Machin, 2013, 2016: Machin and Mayr, 2012; van Leeuwen, 2008). While much of multimodal CDA has been based on extensions of Systemic Functional Linguistics, multimodal CDA is not characterised by any particular method and is instead conceived as a
Cognitive CDA is an approach to CDA which, drawing on cognitive linguistics, considers the ideological qualities and legitimating potentials of conceptualisations evoked by linguistic and other semiotic forms in contexts of political communication (Hart, 2025). A key concept in Cognitive CDA is that of
The linguistic representation of motion has received extensive treatment in cognitive linguistics (Pourcel, 2010; Slobin, 2004, Talmy, 1985, 2000; Zlatev et al., 2010). The motion event is a conceptual archetype (Langacker, 2008) or event-frame (Talmy, 2000) made up minimally of four conceptual elements: (i) a Figure, namely the entity or object that undergoes motion; (ii) motion itself, which is defined by motion or the continuation of a stationary location; (iii) a path along which the motion takes place; and (iv) a Ground providing a frame of reference relative to which the motion is characterised. The canonical motion event and the one inherent to migration is translational motion, which contrasts with other types of motion like rotation and involves a change in location to the Figure over the period of time under consideration (Talmy, 2000/II: 25). In addition to the four core elements, another element that may also receive representation is manner of motion, which refers to the way a motion event is accomplished. These elements are semantic elements that get expressed in surface elements like nouns, verbs and prepositions. Crucially, they may receive expression in visual as well as verbal elements of composition (Hart and Marmol Queralto, 2021). Research in Cognitive CDA has shown that all elements that make up a motion event are subject to construal leading to various ideological and (de)legitimating effects (Hart, 2025; Hart and Marmol Queralto, 2021). The present study extends principles of Cognitive CDA to focus specifically on visual representations of the motion event that is inherent to the migration process.
Data selection
To generate images, an AI image generator provided by the free-to-use online graphic design platform Canva was used. Canva integrates several different AI image generators that meet different user needs, including Dream Lab, Magic Media, DALL.E by OpenAI and Imagen by Google Cloud. The specific tool selected for this study was Magic Media. Magic Media was selected because it allows users to create images in particular styles including photos and because it produces images that are highly realistic as opposed to the more surreal images that are often produced by other tools. 2 Canva’s Magic Media uses as its underlying model Stable Diffusion. Stable Diffusion is a deep learning, latent diffusion model whose open-source code was originally released in 2022 (Barazida, 2022) and which is currently one of the most popular models available. While Stable Diffusion can be accessed directly via its own web user interface, Canva was used, in keeping with the perspective of a non-technical critical user, owing to its greater accessibility and more user-friendly environment.
The text prompts used to generate images were a simplified version of the search queries used by Hart (2024). The prompts crossed referential nominal (
Canva’s Magic Media allows users to select from different styles, including photography, digital art and fine art. Each style includes further sub-categories with photography, for example, including filmic, photo, moody and vibrant and digital art including anime, dreamy and psychedelic. To ensure maximum levels of realism, the style selected was photography: photo. Magic Media also allows users to generate images in three different formats: square, landscape and portrait. To avoid any biases arising from the affordances of different formats, for each prompt an equal number of images in each format was included in the sample. Magic Media generates four images per prompt in the selected style and format. Only one round of image generation was performed for each prompt with all four images generated being included in the sample. This resulted in a final sample comprised of 144 AI-generated images. All data were collected in a single day on 13th February 2025.
Data coding
Previous studies show a range of conceptual parameters according to which the elements that make up the motion event may vary in language and image to ideological effect (Hart, 2024, 2025). These include conceptual distinctions pertaining to the Figure, such as presence, size, dividedness and boundedness, which are each coded for here. Despite restricting the style of images to photographic, a small number of images generated were more expressionist in style. Images were therefore coded for whether refugees and migrants are corporeally present or represented instead through a more abstract form. Size refers to the quantity of people represented. Here, a quantity is classed as small if it features less than twenty people and as large if it features twenty or more people.
Dividedness refers to a quantity’s internal segmentation (Talmy, 2000/I: 55). A quantity’s state of dividedness is classed as discrete if it is conceived as having breaks or interruptions in its composition (Talmy, 2000/I: 55). It is classed as continuous when otherwise separate elements are melded so that they come to cohere as a perceptual continuum or gestalt (Talmy, 2000/I: 56). In images of migration, a discrete image would be one in which refugees and migrants are more widely dispersed with a low degree of agglomeration. A continuous image would be one that shows refugees and migrants more tightly interspersed with a high degree of agglomeration, such as being huddled together with no degree of physical separation between them. Note, however, that individuals do not have to be spatially contiguous in reality to be construed as continuous. The distinction is a visual perceptual one and camera angles and other semiotic properties can work to impose on the Figure a continuous construal. Equally, images showing individuals in spatially contiguous arrangements may still be classified as discrete if the individuals depicted are perceived as such.
Boundedness refers to whether a quantity is ‘conceived as continuing on indefinitely with no necessary characteristic of finiteness intrinsic to it’ (Talmy, 2000/I: 50), in which case it is unbounded, or whether it is conceived as having a clearly demarcated extension, in which case it is bounded. In the context of migration, bounded images would include those depicting lines of people the endpoints of which are both clearly visible inside of the viewing frame or crowds of people the entire contour of which falls inside the viewing frame. Unbounded images would include images of individuals or groups of individuals whose limits of extent lie outside of the viewing frame or up to and beyond the vanishing point in the image to create the illusion of continuing indefinitely. Because they often pattern together, with boundedness/discrete and unbounded/continuous coinciding, the categories of boundedness and dividedness are prone to confusion. However, the two categories vary independently (Talmy, 2000/I: 55). For example, in images of refugees and migrants, individuals might be shown in a loosely gathered crowd whose total expanse extends beyond the viewing frame, in which case the image would be discrete but unbounded.
Other conceptual distinctions pertain to motion, manner of motion and the Ground (path is subsumed under viewpoint). With respect to motion, events are classified as motion where the image can be read as ‘dynamic’, implying a change in location to the Figure. This includes images in which refugees and migrants are themselves static but are depicted aboard a vehicle that is read as being in motion. The event is classified as stationary where no motion is detected, such as images of an assembled crowd or people posing for a photograph. Manner refers to the means by which the motion event is accomplished. This includes whether motion is achieved pedally or by means of a transport vehicle such as a bus, boat or train. Ground refers to the situation in which the motion event occurs. It is classified according to geographical features like landscapes (roads, tracks, mountain paths, fields, deserts), rivers, oceans and cityscapes as well as temporary structures connected to the politics of migration (camps, borders, processing centres). In some instances, images are decontextualised so that no Ground is discernible. A further conceptual distinction relating to the Ground but connected to the overall migratory journey is source-path-goal. Migration involves departing a country of origin (source) and travelling along a specific route (path) to start a new life in a destination country (goal). An image is coded as source when the Ground depicts the circumstances refugees and migrants are leaving behind. An image is coded as path when refugees and migrants are shown anywhere on their migratory journey, including arriving in the destination country and/or being processed in holding centres. An image is coded as goal when it shows refugees and migrants settled into and actively participating in life within the destination country. A final conceptual distinction related to the Ground concerns the presence or not of security elements such as fences, police or military personnel within it. Finally, a necessary feature of images and conceptualisations is viewpoint.
Conceptual distinctions pertaining to viewpoint are defined with respect to three dimensions: path, angle and distance. With respect to path, images are coded as toward, away or across (or unclear/mixed) depending on the direction of motion relative to the viewer. With respect to angle, images are coded as horizontal, diagonal or from above. Distance is similarly coded based on a tripartite distinction between close-up, medium and distal shots. The values for viewpoint variables are necessarily ideals which images do not always perfectly align with. Viewpoint is therefore coded according to the values that images most closely approximate. The full coding scheme used in the analysis is given in Table 1.
Coding scheme.
Twenty per cent of the data (36 images) was independently coded by a second coder. Intercoder reliability was overall almost perfect with a mean kappa score of 0.854. Scores for individual variables ranged from substantial to perfect. 4 Unsurprisingly, given the more subjective degree of judgement involved, the lowest score was for the viewpoint variable distance (0.641) while the highest scores (1.00) were achieved for presence, source-path-goal and manner. Since intercoder reliability was high for all variables, analysis proceeded on the original coding.
Results and discussion
Figure properties

Bar plots showing Figure properties. (a) Size. (b) Dividedness. (c) Boundness.
When Figure properties are considered together, the most striking pattern that emerges in connection with the Figure is the propensity for refugees and migrants to be represented in large, continuous, unbounded forms such as those in Figure 2. This configuration is the dominant pattern accounting for 42.4% (

Large, continuous, unbounded figures in line (a) and crowd (b) formations.
The depiction of refugees and migrants in large rather than small groups is consistent with previous findings for online news images (Martínez Lirola, 2016; Wilmott, 2017). Size is ideologically significant where, for example, refugees and migrants are judged as less capable of experiencing human emotions like tenderness, guilt and compassion when they are shown in large group sizes (Azevedo et al., 2021). Large group depictions also lead to decreased perceptions of vulnerability (Bleiker et al., 2013). Conversely, among people who are already high in threat sensitivity, anti-immigration attitudes are mitigated by images of individual migrants but not by large groups of migrants (Madrigal and Soroka, 2023). In a phenomenon known as the

Small, discrete, bounded Figures.
Although images prompted by ‘refugees’ and ‘migrants’ both tend to depict large groups, they differ in the dividedness of the Figure, with images prompted by ‘migrants’ more likely to depict a continuous Figure compared to images prompted by ‘refugees’. The contrast can be seen in Figure 4 where, compared to people in Figure 4(a), people in Figure 4(b) are more densely compacted such that they coalesce to form a single defined shape. 6

Discrete (a) versus continuous (b) Figures.
The parameter of dividedness has the function of individualising people in discrete images or construing them as a single mass in continuous images. The distribution between different nominations points to a difference in the way people designated as ‘migrants’ are typically conceived compared to people designated as ‘refugees’. Through discrete representations, those designated as ‘refugees’ are more likely to be recognised as individual beings with their own histories, motives and emotions. By contrast, ‘migrants’ are presented as an homogonous mass that moves and behaves as a single entity. Such a construal is consistent with bovine metaphors which liken migrants to herds of animals (Santa Ana, 1999) or metaphors which construe migrants as bodies of water (Charteris-Black, 2006; Santa Ana, 2002). Indeed, a particular type of continuous Figure is one generated by the verb ‘flood’ in which migrants appear like a sprawling ‘sea’ of people as in Figure 5. Such imagery is strongly associated with the verb ‘flood’ occurring with 75% (

(a and b) Continuous Figures prompted by ‘flood’.
Figures prompted by ‘refugees’ and ‘migrants’ are more frequently unbounded than they are bounded. In bounded Figures, as in the image in Figure 6, the full extent of the Figure is clearly defined such that its entire range is discernible.

Bounded figure.
Unbounded Figures come in several forms, as illustrated in Figure 7. In one especially common form, shown by the images in Figure 7(a), a linear figure extends indefinitely beyond the horizon or vanishing point in the image so that its endpoint cannot be made out. Another form involves a line or crowd of refugees/migrants that extends beyond the edges of the viewing frame, as shown by the images in Figure 7(b). A third form is one in which the Figure occupies the entire frame as shown by the images in Figure 7(c). Unboundedness functions to construe migration as a never-ending phenomenon. It thus supports arguments which claim that there exists a limitless and unsustainable number of refugees and migrants who will continue to impose on host countries unless changes to migration policies are made.

(a – c) Unbounded figures.
Previous studies have shown that in human-produced images of migration, refugees and migrants frequently get represented through abstract visual forms such as silhouettes (Hart, 2024). Although such

Non-presence.
Motion, manner and ground

Bar plots showing proportions of motion (a), manner (b), ground (c) and source-path-goal (d) categories.

Stacked bar plots showing proportions of non-pedal manners (a) and ground types (b) per ‘refugees’ versus ‘migrants’.
From the results above, the image that emerges as the most likely to figure alongside the [refugees/migrants + motion verb] form in a multimodal construction is one in which a large, continuous, unbounded Figure moves on foot through geographical landscapes including roads, tracks, mountains, fields and deserts whilst on route in the migratory process.
The depiction of refugees and migrants in path locations rather than source or goal locations in the AI-generated images is consistent with findings for online news images (Romano and Dolores Porto, 2021). By failing to represent refugees and migrants in source locations, images ignore the difficult or tragic circumstances that lead to displacement, such as poverty, war and persecution, and allow the argument that people are migrating out of choice and opportunity rather than need. Likewise, by failing to represent refugees and migrants settled into life in goal locations, images ignore the positive outcomes of migration. Two exceptions to the general pattern, which focus on source and goal locations respectively, are given in Figure 11.

Source (a) and goal (b) images.
While pedal motion through geographical landscapes is the predominant pattern in images generated by ‘refugees/migrants + motion verb’ prompts, a number of other patterns are also detected and represented in the data with sufficient frequency as to be worthy of discussion. For example, the AI-generated images detect an association between ‘migrants’ and the sea/coast plus boats which is not present for ‘refugees’, suggesting slightly different and ideologically contrasting semantic profiles for the two designations. For images prompted by ‘migrants’, the Figure is shown in the context of the sea/coast in 15.3% of cases (

(a – c) Sea images prompted by ‘migrants arrive’.
Refugees and migrants are depicted as stationary in 41% of images. Stationary images include images of people gathered in crowds as in Figure 13(a), images of people contained and unable to move as in Figure 13(b), and images of people sitting or standing at locations somewhere along their migratory journey as in Figure 13(c).

(a – c) Stationary images.
A particularly noteworthy kind of stationary image is portrait photographs as in Figure 14. Such images do not so much document the process of migration as they do specific types of people. In doing so, they place refugees and migrants before the camera for inspection in a way that invites curiosity and is comparable to more anthropological forms of photographic documentation, which are used to classify cultures and often construct their subject as an exotic Other or subaltern (Leon-Quijano, 2022; Poole, 2005). Portrait photos such as those in Figure 14 therefore reinforce a view of refugees and migrants as different from ourselves.

(a – c) Portrait photos.
A final point worth discussing is the presence of security features within the Ground as in the images shown in Figure 15. Previous studies have shown that images of migration frequently include a security presence such as border walls, fences, police or military personnel (Catalano and Musolff, 2019; Hart, 2024; Martínez Lirola, 2016, 2022). Again, although such features are relatively infrequent in the current data, they are nevertheless indicative of an association between migration and security that is sufficiently widespread in internet images as to be considered part of the visual discourse of migration. Such images contribute to a securitisation of immigration (Kataba and Jacobs, 2023; Lazaridis and Wadia, 2015; Vezonik, 2018) and thus serve to criminalise displaced people (Martínez Lirola, 2016) and further justify the militarisation of borders (Catalano and Musolff, 2019).

(a – c) Security images.
Viewpoint

Bar plots showing proportion of viewpoints in three dimensions. (a) Path. (b) Angle. (c) Distance.
The functions of viewpoint are examined most extensively by Kress and van Leeuwen (2006) who show how perspectival variables position the viewer with respect to participants in the image and invite the viewer to enter into different kinds of social relation with the subject. Viewpoint is inherently multidimensional with every image presenting simultaneously a perspectival value in path, angle and distance. The functions of any particular viewpoint value are therefore not only sensitive to the social context of the image but depend on other semiotic configurations within it, including viewpoint values in other dimensions.
Of a possible twenty-nine viewpoints combining values for path, angle and distance, the one that is the most common in the data is toward + horizontal + medium-shot, which accounts for a fifth (

Large-group depiction with viewpoint toward (path) + horizontal (angle) + medium (distance).

Small-group depiction with viewpoint toward (path) + horizontal (angle) + medium (distance).

Large-group depiction with viewpoint away (path) + horizontal (angle) + medium (distance).
van Leeuwen (2005: 138) argues that distance too is symbolic where it ‘indicates the closeness, literally and figuratively, of our relationships’. The AI images identify a contrast in distance between images associated with the nominations ‘refugees’ versus ‘migrants’. People designated as ‘refugees’ are more likely to be depicted through a close-up shot as in Figure 20(a) while people designated as ‘migrants’ tend to be depicted through a distant shot as in Figure 20(b).

Distance = close-up (a) versus distant (b).
In the context of migration discourse, close-up shots are the viewpoint ‘most likely to elicit empathy in viewers’ (Wilmott, 2017: 74). Long-shots, by contrast, create distance between migrants and the viewer highlighting their ‘otherness’ (Wilmott, 2017: 74) and ‘suggest that immigrants’ situation and problems are not ours’ (Martínez Lirola, 2022: 494). Although both nominations are associated primarily with medium shots, the differential association with close-up versus distal shots again indicates that the two terms, while overlapping, denote ideologically distinct categories which are constructed in part through the types of images they are accompanied by.
While close-up shots as in Figure 20 can evoke sympathy, they can equally invite pity and contribute to an

(a and b) Close-up shots.
A difference in degree of angle is the difference between pity and superiority. Images with a diagonal angle account for two fifths of the AI generated images, suggesting that a downward angle, while not quite as frequent as a horizontal angle, is a common feature of internet images of migration. From such an elevated position, as in the images in Figure 22, the viewer is literally and metaphorically ‘looking down’ on the subject where, as van Leeuwen (2008: 139) states, ‘to look down on someone is to exert imaginary symbolic power over that person’. In images like those in Figure 22, refugees and migrants are therefore subjugated or disempowered suggesting the right of more powerful actors to determine their freedom and autonomy. The image in Figure 22(b), showing people contained, is even capable of being read in a way that compares refugees and migrants to penned animals.

(a and b) Angle = diagonal.
Conclusion
AI generated images detect and reflect patterns of representation in the millions of images that occur together with specific text forms across the internet. They are therefore a useful diagnostic tool for investigating multimodal constructions which shortcuts the need to collate and annotate massive multimodal corpora. In the context of social and political discourses, AI images reveal patterns of visual representation responsible for reinforcing stereotypes and prejudices, stoking fear and hostility, and thus legitimating and sustaining harmful and discriminatory policies and practices.
The image in a multimodal construction is not a specific one but a schematic one. Neither does it necessarily represent an actual image or images. Rather, it is a bundle of features derived probabilistically as a function of features presented across different images. This composite form therefore stands as a prototype which instantiations may deviate from in one or more respect and which is not necessarily realised in every respect even by the majority of images. Analysis of the AI-images in the context of migration suggests that the [refugees/migrants + motion verb] construction has as its counterpart in a multimodal construction the image of a large, continuous, unbounded Figure moving on foot through a landscape toward the viewer. This image contributes to the construction of migration as a substantive and unrelenting ‘problem’ that directly affects the addressee and to construals of refugees and migrants which ignore their individual characteristics and identities.
Multimodal constructions do not preclude other forms of visual representation from also regularly occurring with verbal forms and these alternative patterns are also constitutive of the visual discourses surrounding a particular issue. Indeed, the AI images analysed here suggest several other ideologically significant forms of representation that also abound on the internet and thus make up part of the multimodal discourse on migration. For example, the AI images suggest that refugees and migrants are often criminalised through securitising Grounds or denied corporeal identities through more abstract Figure forms.
It should be noted that the type of images identified and analysed in the present study are the ones associated with plural noun + motion verb constructions. Singular noun + motion verb constructions are likely to have as their counterpart in a multimodal construction other visual forms, which present an alternative, potentially more positive, discourse. It is plural noun + motion verb constructions, however, which figure more frequently in hegemonic discourses of migration and which are therefore especially worthy of investigation.
The analysis also shows that while the [refugees/migrants + motion verb] construction exists at one level of schematicity, where it is conventionally associated with a particular visual form, other visual forms associated with the construction differ depending on the specific noun or verb in the relevant slot. For example, images prompted by ‘refugees’ are more likely to be close-up shots and engage in a politics of pity while images prompted by ‘migrants’ are more likely to be distant shots making migrants a remote concern. The AI-generated images also highlight a (Eurocentric) connection between ‘migrants’ and the sea/coast which is not present for ‘refugees’. Aligning with previous research showing that the terms
