Abstract
Keywords
The dawn of the artificial intelligence (AI) revolution has marked an unprecedented societal shift (Xie, 2023). Prominent in this shift is the generation of realistic humanlike AI faces, twinned with public concern that AI might distort the perception of truth (Devlin, 2023). AI-generated faces are now widely available (e.g., this-person-does-not-exist.com) and are being used for both prosocial and nefarious purposes, from finding missing children (Chandaliya & Nain, 2022) to transmitting political misinformation via fake social media accounts (e.g., Hatmaker, 2020). AI faces are now so realistic that people often fail to detect they are not human (e.g., Nightingale & Farid, 2022). However, because this technology has advanced so rapidly (Hao, 2021), there have been few empirical tests of this ability. Here we argue that AI faces are not just indistinguishable from human faces but that, in fact, they may be perceived as more “human” than real human faces. We term this striking and counterintuitive phenomenon
Psychology offers decades of theoretical and empirical work with potential to explain AI hyperrealism. For example, the influential

Schematic illustration of face-space theory: A potential explanation for AI hyperrealism. Orange dots show sample distribution of human faces; purple dots show hypothesized distribution of AI faces. We focus on relevant abstract principles of face-space theory (e.g., relating to single images of faces in human perception). For more nuanced discussions, see Burton et al. (2016), O’Toole et al. (2018), and Valentine et al. (2016). For psychophysics-related work, see Abudarham and Yovel (2016) and Rhodes and Jeffery (2006).
Statement of Relevance
Artificial intelligence, or AI, can now generate faces that are indistinguishable from human faces. However, AI algorithms tend to be trained using a disproportionate number of White faces. As a result, AI faces may appear especially realistic when they are White. Here, we show that White (but not non-White) AI faces are, remarkably, judged as human more often than pictures of actual humans. We pinpoint the perceptual qualities of faces that contribute to this hyperrealism phenomenon, including facial proportions, familiarity, and memorability. Problematically, the people who were most likely to be fooled by AI faces were the least likely to detect that they were being fooled. Our results explain why AI hyperrealism occurs and show that not all AI faces appear equally realistic, with implications for proliferating social bias and for public misidentification of AI.
A psychological analysis of AI representativeness can also help with understanding a puzzle arising from the handful of studies that have investigated people’s ability to detect AI faces: Although one recent study found that people were unable to distinguish AI from human faces (Nightingale & Farid, 2022), two others go further to suggest that people may overidentify AI faces as human (Shen et al., 2021; Tucciarelli et al., 2022). How can we explain this puzzle? All three studies used the StyleGAN2 algorithm but varied in the race of the faces they tested. These demographic differentials are critical because StyleGAN2 was trained on primarily White faces (~69% White, ~31% for all other races combined; see Supplemental File S1 in the Supplemental Material available online), potentially biasing the algorithm toward the statistical regularities of White faces. This bias may lead to White AI faces that appear especially average (indicated in Fig. 1) and therefore, potentially, especially realistic. Consistent with this theory, Shen et al. (2021) and Tucciarelli et al. (2022) found preliminary evidence of AI hyperrealism to the extent they tested White faces, although Tucciarelli et al.’s (2022) stimuli were also preselected to be particularly realistic, biasing them toward this finding. Intriguingly, Nightingale and Farid (2022) also reported more errors for White than non-White AI face detection. However, they did not pursue this question further. If AI faces do appear more realistic for White faces than other groups, their use will confound perceptions of race with perceptions of being “human.” Thus, the use of popular StyleGAN2 faces may risk misleading scientific conclusions (Dawel et al., 2022) and may even perpetuate social biases in real-world outcomes, from influencing elections to finding missing children (Chandaliya & Nain, 2022; Hatmaker, 2020).
The Present Research
Here we aimed to investigate the potential for AI hyperrealism and provide the first test of whether people have insight into their AI detection errors. If people mistake AI faces as human but have low confidence in their judgment, they may respond more cautiously (e.g., investigating an online profile). However, if they are convinced their judgment is correct, their errors may be more consequential (e.g., falling for a fraudulent profile). Although Tucciarelli et al. (2022) found confidence was higher for judgments of AI than for human faces overall, it is currently unknown whether people are aware of their AI detection errors. Errors are associated with lower confidence for other face judgments (e.g., face identity recognition, Palermo et al., 2017; eyewitness identification, Wixted & Wells, 2017). Thus, we predicted that poorer AI detection would be associated with lower confidence.
We also aimed to identify visual attributes that distinguish AI from human faces and address the critical unanswered question of why people fail to detect AI faces. Our theorizing suggests that the emergent perceptual attributes of face-space—such as facial averageness, memorability, attractiveness, and familiarity—may play a role, given their importance for human face perception (Valentine et al., 2016; Vokey & Read, 1992). Because little is known about which cues people use for AI detection, we augmented this theoretical perspective with a data-driven approach by asking participants what information they used to guide their judgments.
Reanalysis of Nightingale and Farid (2022)
We started with a proof-of-principle by reanalyzing data from a prominent recent study that included information about face race (Nightingale & Farid, 2022, Experiment 1) to investigate the potential for AI hyperrealism. Analyses showed clear evidence of AI hyperrealism for White faces, but not for non-White faces. Figure 2a shows that White AI faces were judged as human significantly more often than White human faces,

Reanalysis of data from Experiment 1 of Nightingale and Farid (2022) and results for current Experiment 1. Error bars represent 95% confidence intervals. N&F E1 = data from Nightingale and Farid (2022), Experiment 1; n.s. = nonsignificant.
Experiment 1
To investigate whether people have insight into their AI hyperrealism errors and uncover what causes this somewhat counterintuitive phenomenon, we asked a new set of participants to report how confident they felt, and what information they used, when attempting to distinguish AI from human faces. Focusing our new empirical work on the White AI faces from Nightingale and Farid (2022) enabled us to test the robustness of AI hyperrealism with a new set of participants.
Open practices statement
We report all measures and exclusions (see Supplemental File S2), along with power analyses justifying our sample sizes (see Supplemental File S3). Data, analysis scripts, and materials are available on the Open Science Framework at osf.io/sz2fe/. Stimuli are available at osf.io/ru36d/. Data were analyzed using R version 4.2.1 (R Core Team, 2021) and JASP (JASP Team, 2023). Meta-
Method and participants
The final data were from 124 adults (61 men, 62 women, 1 preferred another term) recruited from Prolific (www.prolific.co). Participants were White U.S. residents, aged 18 to 50 years (
Stimulus materials
We used the 100 AI and 100 human White faces (half male, half female) from Nightingale and Farid (2022; see osf.io/ru36d/). The AI faces were generated using StyleGAN2. The human faces were selected from the Flickr-Faces-HQ Dataset (Karras et al., 2021, used to develop StyleGAN2) to match each of the AI faces as closely as possible (e.g., same gender, posture, and expression). All stimuli had blurred or mostly plain backgrounds, and AI faces were screened to ensure they had no obvious rendering artifacts (e.g., no extra faces in background). Screening for artifacts mimics how real-world users screen AI faces, either as scientists (Peterson et al., 2022) or for public use (Satter, 2021), and therefore captures the type and range of stimuli that appear online. Participants were asked to resize their screen so that stimuli had a visual angle of 12° wide × 12° high at ~50 cm viewing distance.
Participants were assigned in counterbalanced order by gender to view either all the male or female faces (50 AI + 50 human faces = 100 trials in total) so that approximately equal numbers of men and women were assigned to each face sex. Faces were shown individually until a response was made, with order randomized across participants.
Procedure
Participants were told that they would see approximately 100 faces with the task of deciding whether each face depicts a real human or is computer-generated (AI). We defined “human” as people who exist in the real world and “computer-generated” as pictures that have been made by AI technology for generating highly realistic images of people who do not exist in the real world. After deciding whether a face was AI or human, participants rated their confidence on each trial from 0 (
Results
Analytic strategy
First, we calculated the percentage of stimuli judged as human, the error percentages, and the mean confidence ratings for each participant (for AI and human faces separately; Supplemental File S5). Complementary stimulus-level analyses are reported in Supplemental File S6. We also calculated participant-level signal detection measures:
AI hyperrealism is robust
Figure 2a shows that the hyperrealism found for White AI faces in our reanalysis of Nightingale and Farid (2022) was fully replicated in our new sample, indicating that this effect is robust. White AI faces were judged as human significantly more often than White human faces,

Faces judged most often as (a) human and (b) AI. The stimulus type (AI or human; male or female), the stimulus ID (Nightingale & Farid, 2022), and the percentage of participants who judged the face as (a) human or (b) AI are listed below each face.
Do people have insight into their AI detection errors?
Concerningly, we found that participants who were the worst at detecting AI faces had the poorest insight into their abilities, against our prediction from the face identification literature. However, the accuracy-confidence relationship differed by face type: Although lower error rates for classification of human faces were associated with higher confidence as predicted,
To investigate participants’ insight into their performance, free from bias in confidence ratings (e.g., reporting high confidence for all judgments), we used meta-
What visual attributes do participants report using to judge faces as AI versus human?
Figure 4 presents the qualitative coding framework capturing the attributes that participants reported using when they judged whether faces were AI or human. The size of each segment indicates the percentage of total codes captured by each theme. The framework is composed of 21 main themes with 20 subthemes (e.g., “eyes” is a subtheme of the specific facial features theme). Responses could be coded into multiple themes, and thus each response was coded into an average of 2.29 themes. For instance, the response, “If the faces were overly symmetrical and if they [sic] eyes looked fake” was coded into the “symmetry,” “eyes,” and “artificial” themes. A total of 546 codes were applied to the 239 responses. Supplemental File S9 includes example quotes for each theme.

Qualitative responses from Experiment 1: percentage of codes (
Experiment 2
The phenomenon of AI hyperrealism implies there must be some visual differences between AI and human faces, which people misinterpret. Very little is known about what these differences might be. Tucciarelli et al. (2022) found a partial negative contribution of attractiveness, which aligns with our predictions based on face-space, because faces at the core of face-space (more average faces) tend to be more attractive, all else being equal (Rhodes, 2006). Shen et al. (2021) also found that removing background scenery made AI and human faces indistinguishable; however, background information was matched for our stimuli, rendering this latter explanation unlikely here.
Thus, in Experiment 2 we investigated the capacity of 14 attributes derived from face-space and Experiment 1 qualitative reports to explain AI hyperrealism. We also tested for the first time whether human-perceivable information can be used to accurately classify AI and human faces, using machine learning. If, as we hypothesize, StyleGAN2 is biased to produce faces toward the center of face-space, AI faces should be perceived as more average, familiar, and attractive, but as less memorable than human faces.
Method
Participants
The final data were from 610 participants (290 men, 312 women, 8 preferred another term;
Procedure
In total, 14 attributes were rated (Table 1). In addition to the four attributes derived from face space theory (distinctiveness/averageness, memorability, familiarity, attractiveness), we focused our analyses on attributes commonly mentioned in Experiment 1, resulting in nine attributes. We also included perceived age because we wanted to isolate the contributions of other related attributes, such as attractiveness and skin smoothness. Supplemental File S10 provides a detailed rationale. Each condition had five attention checks that asked for specific numeric ratings. Experimental stimuli and procedure were otherwise identical to Experiment 1.
Experiment 2 Visual-Attribute Rating Conditions
Note: See also the full experimental surveys on the Open Science Framework (osf.io/sz2fe/). a“Alive in the eyes” combines the “eyes” and “uncanny valley” themes from Experiment 1’s qualitative framework. b“Proportional” combines the “features work as a whole” (are in proportion with one another) and “proportional” themes. c“Smooth-skinned” derives from “the skin or wrinkles” and “perfectness” themes.
Results
Analytic strategy
We calculated the stimulus-level mean rating for each face for each of the 14 attributes separately. Then, using our data from Experiment 1, we calculated the percentage of participants who judged each face as human. Higher percentage values indicate that more participants judged the face as human. Stimulus type (i.e., AI or human faces) was dummy-coded (0 = AI faces and 1 = human faces).
Which visual attributes contribute to faces being judged as human?
To determine what attributes made faces look real (even if they were AI-generated), we constructed a multiple linear regression model predicting the percentage of participants who judged each stimulus as human from the 14 stimulus-level attribute means.
2
All variables were standardized prior to model entry. The model explained the majority of observed variance (62%) in how often faces were judged as human,
Standardized Coefficients for Each Attribute (Ordered by β Weight) in Our Linear Regression Model Predicting Experiment 1 Stimulus-Level Percentage Judged as Human
Note: CI = confidence interval; boldface type indicates
Which attributes contribute to AI hyperrealism?
Here, we take a novel approach by applying a Brunswikian lens model (Brunswik, 1956; Hall et al., 2019) to reveal how each of the 14 attributes contributed to faces being (mis)judged as human (Fig. 5). Constructing a stimulus-level lens model (using

Lens model testing contributions of each attribute to (mis)judgment of faces as human (ordered by indirect effect size). Red boxes show significant negative indirect effects—attributes that were utilized in the wrong direction to judge AI/human status. Green boxes show significant positive indirect effects (attributes that contributed to accurate AI/human judgments). Gray boxes show attributes that are useful for detecting AI faces but were not utilized by human observers. Dashed lines indicate nonsignificant effects (see Table S13 in the Supplemental Material).
Critically, in line with our face-space theory prediction that AI faces would be more average than human ones, AI faces were significantly more average (less distinctive), familiar, and attractive, and less memorable than human faces. Overall, AI hyperrealism was explained by larger cumulative effects for the attributes that were utilized in the wrong direction—facial proportions, familiarity, and memorability (in red, Fig. 5; β = −0.67, 95% CI = [−.88, −.46],
Can human-perceived attributes be used to accurately classify AI and human faces?
Given that humans are unable to detect current AI faces, society needs tools that can accurately identify AI imposters. Present AI detection algorithms are limited to specific databases (e.g., the popular Google Chrome extension, V7 Fake Profile Detector, works only for StyleGAN faces). Human perception may be useful for improving algorithmic generalizability, as integrating additional parameters into algorithms has proved useful in other domains (J. W. Miller et al., 2022). We therefore provide the first investigation of whether machine learning can leverage human-perceived attributes to accurately classify AI and human faces.
Using 10-fold cross-validation, we constructed a random forest classification model (mtry = 4; the square root of the number of predictors, rounded to the nearest whole number) predicting face type (AI vs. human) from the 14 attributes identified in Experiment 2. The model was able to accurately classify face type with 94% accuracy, 95% CI = [91%, 97%],
Confusion Matrix of Correct and Incorrect Machine Classifications
Note: Correct classifications are in boldface.
General Discussion
We find that White AI faces are perceived as hyperreal and that observers are overconfident in their ability to detect them. By combining psychological theory with a novel data-driven approach and machine learning, our study significantly advances understanding of why AI hyperrealism occurs. Specifically, we were able to pinpoint perceptual attributes that accurately distinguish AI from human faces and model how people misuse this information, explaining a significant majority of the variance in humans’ AI judgments. The identification of these attributes provides a critical foundation in the future for detailed psychophysics work aiming to map AI face-space. Importantly, the present findings are generalizable to the types of images used online, because AI faces are screened for image artifacts as they are selected for real-world use (e.g., when committing fraud; Satter, 2021). Also, artifact screening cannot explain the White specificity of hyperrealism in our reanalysis of Nightingale and Farids’s (2022) data, as the same screening criteria were applied across face race.
Our study highlights two separate, and critical, biases. First, generative adversarial networks (GANs) are biased toward the statistical regularities of their most common inputs, which we argue produces AI hyperrealism. Although we demonstrate this point in the context of AI faces, the foundational idea may generalize to other important types of AI outputs, including text and artwork from ChatGPT and DALL-E. Here, as this argument predicts, we found AI faces appeared more average than their human counterparts (see Supplemental Fig. S14). Notably, participants failed to utilize facial distinctiveness/averageness for AI detection and inappropriately utilized several other associated cues (facial proportions, memorability, familiarity), producing hyperrealism. Attractiveness was correctly used as a distinguishing feature, confirming Tucciarelli et al.’s (2022) initial finding. The minority of variance left to be explained suggests that other cues, such as those mentioned less often in Experiment 1 (e.g., “ears,” “glasses”), may also play a small but cumulative role.
Second, we found evidence of White racial bias in algorithmic training that produces racial differentials in the presence of AI hyperrealism, with significant implications for the use of AI faces online and in science. Previously, less realistic computer-generated faces have been used as stand-ins for human faces when it was inappropriate to do so (Dawel et al., 2022; E. J. Miller et al., 2023), and there is concern that the same will happen with AI faces, with implications for inequality. We recommend that studies using AI faces should verify that they are perceived as equally natural across races. On a related note, a pressing question is how to address racial differentials in GANs. It is unclear in face-space theory whether there is one face-space or separate spaces for different demographic groups (e.g., Valentine et al., 2016). Future research could fruitfully test these theoretical questions by comparing a GAN trained on equal numbers of faces of each race with GANs trained separately on different demographic groups.
Importantly, and in contrast to standard AI detection algorithms (which are “black boxes”), the present work makes known the perceptual attributes that lead to accurate AI detection in machine learning. Human accuracy may also be improved by training people to utilize attributes appropriately, though this strategy risks exacerbating overconfidence as technologies progress and certain attributes become outdated. Currently, most algorithms produce only single images of each identity, but soon multiple images of AI products are likely to be available (Chan et al., 2023). We likewise drew on a theoretical account of face-space that focuses on variation between single images; when multiple within-identity AI images are commonplace, future work could apply more nuanced face-space theories (e.g., Burton et al., 2016; O’Toole et al., 2018). Regardless, because AI technology is advancing so rapidly (Bond, 2023), training focused on metacognition and education may be more helpful. For example, Szpitalak et al. (2021) found that people who were advised about the unreliability of human memory were more resistant to misinformation than naive individuals. Educating people about the perceived realism of AI faces could likewise reduce risks by making the public appropriately skeptical.
We also found individual differences in the accuracy of AI face detection (Fig. 2c), opening new lines of research. Participants were selected to have normal-range face perception, yet the best performer achieved only 80% accuracy. However, people with exceptional face recognition abilities (super recognizers; Ramon et al., 2019) may possess superior AI detection skills. A further intriguing question is whether individual differences in the utilization of specific attributes can shed light on why certain individuals are more vulnerable to deception by AI faces.
Conclusion
The present study demonstrates a robust AI hyperrealism effect: Remarkably, White AI faces can convincingly pass as more real than human faces—and people do not realize they are being fooled. We believe psychology has a critical role to play in holding AI technologies accountable to the public good. Society has faced many large-scale, seemingly unsolvable challenges that have subsequently become a normal, and manageable, part of life (e.g., automobile safety). We remain hopeful that social and regulatory responses will reduce potential risks as society adjusts to the inevitable presence of AI in our world.
Supplemental Material
sj-pdf-1-pss-10.1177_09567976231207095 – Supplemental material for AI Hyperrealism: Why AI Faces Are Perceived as More Real Than Human Ones
Supplemental material, sj-pdf-1-pss-10.1177_09567976231207095 for AI Hyperrealism: Why AI Faces Are Perceived as More Real Than Human Ones by Elizabeth J. Miller, Ben A. Steward, Zak Witkower, Clare A. M. Sutherland, Eva G. Krumhuber and Amy Dawel in Psychological Science
Supplemental Material
sj-docx-2-pss-10.1177_09567976231207095 – Supplemental material for AI Hyperrealism: Why AI Faces Are Perceived as More Real Than Human Ones
Supplemental material, sj-docx-2-pss-10.1177_09567976231207095 for AI Hyperrealism: Why AI Faces Are Perceived as More Real Than Human Ones by Elizabeth J. Miller, Ben A. Steward, Zak Witkower, Clare A. M. Sutherland, Eva G. Krumhuber and Amy Dawel in Psychological Science
Footnotes
Transparency
ORCID iDs
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
