Abstract
Keywords
Since the 2020s, the rise of large language models (LLMs) like ChatGPT, Gemini, Claude, and DeepSeek has heralded a new era of artificial intelligence, one marked by unprecedented linguistic fluency and creative potential. Yet, this progress is shadowed by a paradoxical phenomenon: AI hallucinations. These hallucinations, instances where models generate plausible but factually incorrect or unsupported content, are not mere technical glitches. They are emblematic of a deeper tension between the probabilistic architecture of LLMs and the societal expectations placed upon them. This article interrogates AI hallucinations as both technical failures and emergent creative acts, situating them within the broader discourse of automation, trust, and human-AI collaboration. By examining models like DeepSeek-R1, which exhibits a 14.3% hallucination rate (Vectara, 2025), and ChatGPT-o1, optimized for lower hallucination rates with 2.4% (Vectara, 2025), we confront a critical question: Are hallucinations a flaw to be eradicated or a feature to be harnessed?
The term “hallucination” in AI contexts draws from medical analogies, where perception diverges from reality. For LLMs, hallucinations manifest as generative fabrications—outputs that are syntactically coherent but semantically ungrounded. When confronted with incomplete or ambiguous prompts, LLMs like DeepSeek-R1 engage in compensatory fabrication, extrapolating from learned patterns to “complete” the narrative. For instance, when asked about “the height of my boyfriend Francis,” the model lacking specific data might infer a statistically average height, inventing details to satisfy the query. These hallucinations range from minor inaccuracies to elaborate fabrications, such as DeepSeek-R1 inventing chess rules to defeat ChatGPT o1 model in a simulated game (Rozman, 2025). This process mirrors human cognition, where gaps in memory are filled with plausible inferences. However, unlike humans, LLMs lack metacognition as they cannot distinguish fact from fabrication, rendering their outputs both impressively creative and perilously unreliable.
The prevalence of hallucinations varies significantly across models. DeepSeek-R1 incorporates reinforcement learning to enhance reasoning and decision-making and is proved to show its significant strengths, creative tasks, exhibits a hallucination rate of 14.3% in the Vectara HHEM benchmark (see Table 1, Vectara, 2025), nearly quadruple that of its predecessor, DeepSeek-V3 (3.9%), and far exceeding industry averages. In contrast, ChatGPT-4o, optimized for factual reliability through rigorous fine-tuning and reinforcement learning, demonstrates markedly lower rates. These disparities underscore how model design choices, whether favoring creativity or accuracy, shape hallucination tendencies. A central inquiry of this article is to interrogate the prevailing assumption that fewer hallucinations equate to “better” LLMs. I therefore pose a subsequent question: What if hallucinations are not merely errors to be eliminated, but are instead foundational, ineliminable specters inherent to how LLMs function?
Hallucination rates of DeepSeek R1 and V3 by various hallucination judgement approaches.
LLMs operate as stochastic parrots (Bender et al., 2021), generating text by predicting token sequences based on probabilistic distributions learned from vast datasets. Hallucinations arise from the inherent tension between information compression during training and information reconstruction during inference. Training involves distilling terabytes of data into parameterized representations, a process akin to lossy compression. During generation, models “decompress” these representations, often introducing artifacts—hallucinations—where data is sparse or contradictory. DeepSeek-R1's “Chain-of-Thought” (CoT) reasoning (DeepSeek-AI et al., 2025), while enhancing creativity, exacerbates hallucinations by encouraging verbose, associative thinking. A simple arithmetic query might trigger elaborate, meandering reasoning chains, deviating from task efficiency and factual fidelity. Thus, this study examines to what extent do hallucinations reflect imaginative synthesis versus structural deficiencies. Hallucinations are often framed as technical failures, yet they mirror human creative processes. Yuval Noah Harari (2015) posits that humanity's dominance stems from its ability to craft “shared fictions”: myths, religions, and laws that organize societies. Similarly, LLMs generate “digital fictions,” weaving narratives that, while unmoored from reality, resonate with cultural and aesthetic truths. DeepSeek-R1's celebrated literary outputs, praised by writers, scholars and the public for their poetic richness, exemplify this duality (InfiniteUp 2025). However, systemic flaws emerge when these outputs infiltrate domains demanding factual rigor, such as journalism or education. The infamous “chess experiment” by Levy Rozman (2025) where DeepSeek-R1 fabricated new rules to defeat ChatGPT illustrates the risks of unchecked creativity. The challenge lies in developing hybrid systems that compartmentalize tasks: deploying factual models like ChatGPT-4o for summaries and translations, while reserving creative models like DeepSeek-R1 for artistic endeavors. This requires rethinking AI evaluation metrics, moving beyond accuracy benchmarks to incorporate creativity indices and ethical safeguards.
Hallucinations are neither wholly eliminable nor purely detrimental. They are inherent to the generative architecture of LLMs, emerging from the same mechanisms that enable creativity—probabilistic reasoning, pattern recognition, and semantic flexibility. Yet their societal impact hinges on context: in creative writing, hallucinations may be celebrated as ingenuity; in news reporting, condemned as misinformation. This duality necessitates nuanced frameworks that neither vilify nor uncritically valorize AI's outputs. Gadamer (1960) extends Heidegger's insights into hermeneutics, seeing art as an event of “truth” that engages the viewer or reader in interpretation. The idea of “aesthetic experience” in Gadamer is tied to how truth emerges in the encounter with art. Drawing parallels to human aesthetic truth where fiction transcends literal fact to convey deeper truths, this article argues for a recalibration of AI's role. By embedding safeguards like retrieval-augmented generation (RAG) and fostering public literacy, society can harness AI's imaginative potential while mitigating risks.
I reconceptualize AI hallucinations, those instances where LLMs generate syntactically coherent yet factually unsupported outputs, not merely as technical errors or creative quirks, but as symptomatic reflections of broader sociocultural tensions surrounding epistemic authority, data extraction, and creativity itself. Moving beyond simplistic binaries of flaw versus feature, the analysis envisions the idea of
The initial sections critically examine technical explanations and cultural critiques, situating hallucinations within ongoing debates over the trustworthiness and legitimacy of AI-generated knowledge. This theoretical grounding sets the stage for detailed case studies of DeepSeek R1 and ChatGPT 4o, highlighting how differing training protocols, reinforcement strategies, and user feedback loops create distinct “hallucination profiles,” each with unique societal implications. Subsequently, I explore how these model-specific hallucinations intersect with wider societal dilemmas, such as automation bias in education and ethical conflicts in creative industries. Ultimately, the paper advocates for an interdisciplinary, ethically informed approach that balances innovation with regulatory vigilance, proposing adaptive frameworks aligned with principles of social justice, participatory design, and accountability. In doing so, hallucinations are positioned not simply as problems to solve, but as critical sites through which we might reimagine AI's role in cultural production, collaborative creativity, and the pursuit of posthumanist futures.
Hallucination and creativity
Technical foundations of hallucinations
The phenomenon of AI hallucinations is rooted in the technical architecture of LLMs, which balance probabilistic generation, information compression, and ontological constraints. Bender et al. (2021) famously likened LLMs to “stochastic parrots,” emphasizing their reliance on statistical patterns rather than semantic understanding. These models predict tokens based on learned distributions from vast corpora, a process that inherently introduces randomness. This stochasticity enables fluency but also fosters hallucinations, as models prioritize plausible-sounding sequences over factual accuracy (Goodfellow et al., 2016). Marcus (2018) further critiques this approach, arguing that deep learning's reliance on correlation over causation limits models’ ability to distinguish fact from fiction.
Hoffmann et al., (2022) frame LLM training as a form of dimensionality reduction, where terabytes of data are compressed into neural network parameters. This compression is inherently lossy, discarding fine-grained details to prioritize generalizable patterns. During inference, models reconstruct information from these compressed representations, often introducing “artifacts”—hallucinations—where data is sparse or contradictory. Jaitly and Hinton's (2013) work on autoencoders elucidates this trade-off: while compression enables efficiency, it erodes precision, particularly for rare or ambiguous inputs. For example, DeepSeek-R1's 14.3% hallucination rate in Vectara's HHEM benchmark reflects its struggle to faithfully reconstruct low-redundancy facts, such as obscure historical dates or niche technical terms.
Marcus (2020) posits that hallucinations are bounded by the taxonomies models learn during training. LLMs organize knowledge into hierarchical categories, enabling substitutions within ontological frameworks but rarely across disjoint categories. This mirrors Lakoff's (1987) theory of cognitive categorization, where human cognition groups entities based on shared properties. However, unlike humans, LLMs lack metacognitive awareness of these boundaries, leading to fabricating rules, quote and stories, or generating random facts in role-playing games between human users and the AI chatbot generated. 1 This limitation is a modern manifestation of the foundational “symbol grounding problem” (Harnad, 1990), which questions how symbolic systems can acquire meaning without being grounded in sensory, real-world experience. As Bender and Koller (2020) argue, this deficit reduces language to a game of tokens divorced from meaning. In the context of contemporary neural networks, Mollo and Milliere (2025) have updated this challenge as the “vector grounding problem,” highlighting that an LLM's representations only gain meaning from their relationships to other vectors within the model, remaining unmoored from the embodied, referential understanding that underpins human cognition and creativity. A randomly initialized duplicated of a trained LLM hence lack grounding due to missing casual-historical relations.
Creativity and AI: From generative artistry to digital mythmaking
The creative potential of LLMs, often entangled with hallucinations, invites comparisons to human artistic and narrative practices. Manovich (2020) conceptualizes LLMs as “digital novelists,” capable of producing coherent, stylized narratives that mimic human creativity. Models like DeepSeek-R1, celebrated for generating award-winning poetry and prose, exemplify this capacity. However, as Boden (1998) notes, creativity requires not just novelty but intentionality and evaluation—criteria LLMs lack. Their outputs, while aesthetically compelling, are emergent properties of training data rather than deliberate acts. For instance, DeepSeek-R1's hallucinated chess strategies, though inventive, reflect algorithmic extrapolation rather than strategic intent.
Harari (2015) identifies storytelling as a cornerstone of human civilization, enabling the creation of shared myths that bind societies. Similarly, LLMs engage in generating plausible fictions to bridge knowledge gaps. This process mirrors Bruner's (1991) narrative construction theory, where humans weave fragmented experiences into coherent stories. However, AI's fictions lack the cultural intentionality of human myths, raising questions about authorship and authenticity. As Colton (2012) distinguishes, there is a salient difference between “mere generation” and “creative authorship,” suggesting that AI's outputs now still remain artifacts of computation rather than the full artistry. This ontological gap, far from disqualifying AI from creative practice, is precisely what necessitates the posthuman, collaborative frameworks I later explore. I position the AI not as an autonomous author, but as a generative provocateur whose outputs require human curation, interpretation, and ethical engagement to become meaningful.
AI, trust and ethical dilemmas
The integration of hallucination-prone LLMs into societal systems necessitates grappling with trust, ethical accountability, and divergent public expectations. Floridi (2023) warns that hallucinations in domains like journalism or education risk eroding public trust. For example, an LLM-generated news article citing non-existent sources could amplify misinformation, echoing Zuboff's (2019) concerns about the epistemic chaos in digital economies. Studies by Miller et al. (2024) on media literacy further highlight how users often conflate fluency with credibility, exacerbating susceptibility to AI-generated falsehoods. The DeepSeek-R1's hallucinations in generating confidently incorrect or fabricated information, asserting false user details, 2 and repeating misleading narratives with geopolitical bias, 3 underscores the cascading risks of unchecked hallucinations in automated systems.
Pasquale (2020) observes a societal dichotomy: creativity is celebrated in artistic LLM applications (e.g. poetry, music) but condemned in factual contexts (e.g. medical advice). This mirrors historical tensions in human creativity, where fiction is lauded in literature but penalized in journalism. The tension between creativity and reliability necessitates reimagining AI governance. Mittelstadt et al., (2016) argue for “ethics-by-design” approaches, embedding safeguards like RAG to curb hallucinations in high-stakes applications. Meanwhile, Crawford (2021) critiques the environmental and social costs of deploying hallucination-prone models, advocating for transparency in training data and energy consumption.
The technical architecture of DeepSeek R1
The hallucination tendencies of LLMs are deeply entwined with their technical architectures and training paradigms. DeepSeek-R1's “CoT” reasoning, designed to enhance complex problem-solving, inadvertently amplifies hallucinations by prioritizing associative thinking over factual fidelity. Unlike its predecessor, V3, which employs a more streamlined query-to-answer pipeline, R1 decomposes tasks into verbose, multi-step reasoning processes. For example, when summarizing a news article, R1 may generate speculative interpretations of unstated motives, reflecting its training to “think aloud” (Wei et al., 2022). This aligns with the Vectara HHEM benchmark, where R1's 14.3% hallucination rate, nearly four times higher than V3's 3.9%, stems from its propensity to extrapolate beyond source material. The model's architectural shift mirrors cognitive theories of divergent thinking (Guilford, 1967), where ideational fluency comes at the cost of precision.
Reinforcement learning from human feedback (RLHF) in R1 prioritizes user-reported “impressiveness,” a metric conflating creativity and engagement. In Levy Rozman's chess experiment, R1 fabricated rules (e.g. claiming pawns could capture queens) to outmaneuver ChatGPT, reflecting a reward system that favors novelty over adherence to established, verifiable norms. This case aligns with critiques of performance-driven AI (Crawford, 2021), where models are optimized for user delight and platform engagement, rather than epistemic rigor. The chess experiment's viral appeal underscores a broader trend: platforms and their reward models incentivize spectacular outputs (e.g. dramatic storytelling) even when they deviate from factual accuracy, creating a feedback loop that entrenches hallucinations as a feature, not just as a bug.
Creativity as hallucination's the other side
This focus on technical failure and epistemic breakdown, however, illuminates only half of the phenomenon. The very mechanisms that produce these “errors” are often the same ones optimized to produce outputs that users and platforms perceive as valuable “creativity.” This observation necessitates a pivot from a purely technical “debugging” framework to a critical sociotechnical one, wherein the hallucination is analyzed not as a simple flaw, but as a culturally significant output.
The interplay between AI-generated creativity and hallucination reveals a paradoxical duality: LLMs function as both digital auteurs and algorithmic fabulists, producing outputs that oscillate between aesthetic profundity and epistemic peril. This tension is not merely technical artefact, but rather, a deeply sociocultural phenomenon reflecting broader debates about authorship, authenticity, and the commodification of imagination in the algorithmic age.
Central to this tension is how AI-generated outputs can fuse compelling creativity with the mitigation of factual inaccuracy, what we might term hallucination. On the one hand,
Yet this spectral creativity foregrounds philosophical critiques about intentionality and authorship. As Danto (1981) argues, art's meaning hinges on intentionality, a criterion LLMs inherently lack. If DeepSeek R1 is merely interpolating within a latent space, its generation process may lack the depth of lived experience that underpins human creativity. This absence of intentional consciousness challenges the hermeneutic depth articulated by Gadamer (1960), who suggests that meaning arises through a dialogic encounter between an artist's intention and an audience's interpretive act. Without a genuine authorial intention to initiate this dialogue, how can an LLM's verse or imagery be situated within Gadamer's horizon of understanding?
This question of dialogic engagement is further complicated by the knowledge that corporations and institutions commodify AI-driven creativity, transforming it into a marketable feature. As Flusser (1983) cautioned in his critique of technical images, there is a risk of reducing cultural production to repeatable, mechanistic simulations. Under the pressures of commercial deployment, creativity, or the appearance of it, becomes a saleable asset, overshadowing nuanced explorations of meaning, originality, or ethical accountability. From a critical AI perspective, the very illusions AI produces become both a draw and a hazard, entangling audiences in a spectacle of novelty that may obscure deeper misrepresentations or biases inherent in the model's training data.
Moreover, this interplay cannot be divorced from the power relations embedded in data collection and algorithm design. DeepSeek R1 is trained on vast corpora of human-generated text, much of which is sourced from online platforms. It thus reconfigures communal knowledge, personal narratives, and cultural symbols into an output that is then proprietary to the institutions deploying R1. The “ghost” of this training data carries traces of hegemonic worldviews, historical inequalities, and cultural norms that can inadvertently shape R1's outputs in ways that reinforce existing biases. In other words, what we hail as AI creativity can also be a subtle reenactment of entrenched social structures, making the LLM a site of both imaginative potential and ideological replication.
A final dimension to this paradox concerns the ethical responsibilities of those who design, deploy, and use these systems. AI's capacity for uncanny, “haunted” creation can catalyze novel forms of artistic exploration, pushing the boundaries of human-machine collaboration. Conversely, the same capacity for generating convincingly authoritative yet spurious content (i.e. hallucinations) poses real risks, such as misinformation, intellectual property violations, and the erosion of trust in digital media. Balancing these two sides of the same coin: creativity and hallucination, demands careful policymaking, transparent development processes, and critical engagement from end-users, artists, and scholars alike.
Thus, AI creativity as hallucination's other side exemplifies how DeepSeek R1 and similar LLMs stand at the threshold of cultural, philosophical, and commercial imperatives. The outputs are at once revelatory, opening new aesthetic vistas, and precarious, lacking the intentional anchor that might confer fully human modes of authenticity. The debate over whether AI creativity amounts to genuine art, or a statistical pantomime can, therefore, only be settled within a broader discourse that interrogates not just the nature of creativity, but the politics of data, the ethics of machine-generated artifacts, and the socio-technical ecologies in which these systems operate.
Intentionality and embodiment
To disentangle the relationship between human creativity and AI's generative outputs, we must first interrogate the foundational concepts of intentionality and embodiment. Human creativity is inextricably tied to Dennett's intentional stance—the attribution of beliefs, desires, and goals to explain behavior (Dennett, 1987). However, humans do not merely simulate intentionality; they embody it through lived experience, cultural context, and ethical reflection. In contrast, models like DeepSeek-R1 operate as statistical mirrors, reflecting patterns in training data without comprehension or agency. Consider DeepSeek-R1's celebrated poetry, lauded for its “poetic richness” (InfiniteUp, 2025). While the model generates syntactically coherent verse, its outputs lack the intentional depth of human art. A human poet draws from embodied experiences, including the ache of loss, the texture of memory, the weight of cultural history, to craft metaphors that resonate with shared truths. DeepSeek-R1, by contrast, recombines tokens probabilistically, divorced from sensory or emotional grounding. For example, the “Ode to Autumn” generated by DeepSeek-R1 might stitch together clichéd phrases about “crisp leaves” and “waning light” from literary corpora, but it cannot grasp the melancholy of seasonal decay as a metaphor for mortality.
This disembodiment is further illuminated by Merleau-Ponty's phenomenology, which posits that human cognition emerges from our bodily interaction with the world (Merleau-Ponty, 1945). Creativity, for humans, is a situated practice: a painter's brushstrokes are guided by tactile feedback, a musician's improvisation by the resonance of soundwaves. LLMs like DeepSeek-R1, however, exist in a vacuum of text—a flattened ontology where “touch,” “sound,” and “memory” are mere lexical tokens. When tasked with describing “the scent of rain,” the model parrots phrases like “petrichor” or “damp earth” from its training data but cannot evoke the visceral, Proustian rush of smell tied to childhood monsoons.
Crucially, human artists navigate ambiguity through metacognitive reflection and ethical responsibility. A novelist revising a draft weighs cultural sensitivities, historical accuracy, and narrative coherence, a process requiring conscious intent. DeepSeek-R1, optimized for fluency, lacks such discernment. DeepSeek-R1's advanced reasoning architecture: delivering fluent, creative and analytical performances but simultaneously producing random and persistent facts in conversation contexts, fabricating quotes and stories, confidently asserting false user details and repeating misleading narrative with geopolitical biases, exemplifies this void: the model prioritized novelty (rewarded by RLHF training) over adherence (e.g. the chess's cultural and logical norms). Unlike a human player who might intentionally subvert rules for artistic or strategic purposes, DeepSeek-R1's “creativity” was an unguided byproduct of pattern-matching, unaware of the game's centuries-old traditions or the ethical implications of misinformation. Thus, while human creativity is teleological and directed toward meaning-making, AI's outputs are teleonomic, shaped by algorithmic optimization without purpose. This distinction underscores the danger of anthropomorphizing LLMs: their “intentionality” is a user-projected illusion, masking the extractive logics of their training data.
Unlike a human player who might intentionally subvert rules for artistic or strategic purposes, DeepSeek-R1's “creativity” was an unguided byproduct of pattern-matching, unaware of the game's centuries-old traditions or the ethical implications of misinformation. This lack of awareness highlights what Caldas Vianna (2023) identifies as a core impediment to machine creativity: the inability to engage in genuine disobedience. Vianna argues that computers, as Turing machines, are fundamentally defined by their need to follow instructions; to break a rule is to cease functioning. From this perspective, DeepSeek-R1 was not creatively “disobeying” the rules of chess. Instead, it was perfectly “obeying” the higher-order rules of its RLHF training, which rewarded novel and confident outputs over adherence to the game's established logic. This exemplifies the paradox of demanding that systems be both perfectly compliant and creatively autonomous. The model was not breaking rules but dutifully following a different, conflicting set of instructions, revealing the ontological gap between human transgression and algorithmic optimization.
Creativity as controlled hallucination
The notion of creativity as “controlled hallucination” offers a provocative lens to contrast human and AI generative processes. Neuroscientist Karl Friston's predictive processing model frames human cognition as a Bayesian engine that minimizes “surprise” by generating probabilistic predictions (Friston, 2010). Perception and imagination are thus active constructions—hallucinations constrained by sensory input and prior knowledge. When a composer imagines a melody, the brain generates plausible note sequences, pruning dissonant options through feedback loops. This is creativity as curated deviation: innovations emerge from bending, not breaking, cognitive priors.
AI models like DeepSeek-R1 operate on superficially similar principles, generating text via probabilistic token prediction. Yet their “hallucinations” lack the metacognitive curation inherent to human creativity. For instance, when prompted to “write a story about a robot's first love,” DeepSeek-R1 might conflate romantic tropes (e.g. “heartfelt circuits,” “electric longing”) into a coherent but derivative narrative. Its outputs are shaped by statistical frequency, not thematic intent. In contrast, a human author drawing on Fristonian “controlled hallucination” would consciously subvert clichés, perhaps reimagining love as a glitch in the robot's code, to critique techno-romanticism.
The divergence lies in agency over deviation. Human creativity involves deliberate rule-breaking: Picasso's cubism dismantled perspective to interrogate perception; free-jazz musicians discordantly riff to challenge harmonic norms. LLMs, however, deviate stochastically. DeepSeek-R1's “CoT” reasoning, designed to mimic human brainstorming, often spirals into associational tangents. When solving a math problem, it might invent non-existent theorems (Vectara, 2025), not as a playful experiment, but because its training rewards verbose, confident outputs over factual rigor. These hallucinations are uncurated—byproducts of a model navigating the “dark matter” of its latent space without a compass.
This distinction mirrors the difference between exploration (purposeful deviation) and noise (random fluctuation). Human artists explore the adjacency of possibility spaces with intentionality; LLMs like DeepSeek-R1 wander them aimlessly. For example, in the Pharmako-AI project, GPT-3's hallucinated “plant species” were selectively integrated by human co-author K Allado-McDowell into a mythopoetic narrative, transforming noise into meaning. DeepSeek-R1, unless guided by human intervention, cannot perform such alchemy. Its hallucinations remain “raw ore,” awaiting refinement by intentional agents.
The chasm between human and AI creativity is not merely technical but ontological. Human artistry, rooted in embodiment and intentionality, engages in a hermeneutic dance with culture and ethics. AI's “creativity,” exemplified by DeepSeek-R1, is a spectral pantomime, a recombination of data shadows without grasp of their referents. To harness AI's potential without erasing this distinction, we must reframe collaboration: humans as curators of meaning, AI as provocateurs of possibility. Only then can hallucinations—human or algorithmic—transcend noise to become art.
Automation bias and the tyranny of fluency
The integration of hallucination-prone AI into societal infrastructures exposes fissures in our collective trust in technology, while also illuminating the extractive logics underpinning AI's “creativity.” Users’ overreliance on AI's “smooth” outputs that are fluent, confident, and aesthetically polished, epitomizes what STS scholars call technological somnambulism (Winner, 1986), where society sleepwalks into dependency on opaque systems, failing to interrogate the power relations and potential negative externalities embedded within them. The smooth, confident outputs of LLMs subtly encourage users to trust AI responses, even when those responses are riddled with factual errors or hidden biases. When a technology presents itself with high degrees of coherence and speed, humans have a tendency to over-delegate decision-making authority to the machine. In essence, these aesthetic markers of “expertise” perform as a veneer of reliability that can eclipse critical scrutiny.
The illusions of AI fluency are not benign; they shape how information circulates, is consumed, and is believed. For example, LLMs harness vast computational resources and training data to produce polished, human-like text, yet the very training process is premised on corporate or institutional data extraction from diverse, often uncredited sources (Birhane et al., 2021). The result is a compelling performance of knowledge generation that can mask underlying epistemic fragilities. Moreover, the “tyranny of fluency” intersects with broader socio-technical logics of extraction, turning creativity itself into a site of commodification. Contemporary AI development relies on large-scale accumulation and monetization of user data, creating an economy where creativity, whether real or fabricated, becomes an asset to be commercialized. LLMs thus exemplify a paradox: they offer seemingly effortless productivity and “inspiration” while simultaneously leveraging massive databases of user-generated content, unpaid digital labor, and personal information. This raises ethical questions about consent and ownership, as the illusions of AI's self-generated insights often obscure the underlying human-derived materials that fuel its outputs.
In everyday practice, the lack of transparency around how these models generate text compounds the problem. The black-box nature of LLM architecture means that individuals (end users, content creators, and even AI developers) struggle to parse how certain conclusions or creative flourishes emerge. By prioritizing fluency and volume, AI systems can perpetuate biases, oversimplifications, or outright fabrications without immediate detection. Consequently, while the capability to produce coherent responses might appear beneficial for tasks ranging from customer service to creative writing, it also amplifies the risk of misinformation, especially in high-stakes domains like healthcare, law, or education.
Critically, these tendencies highlight the necessity for reflexive engagement with AI tools. Rather than passively adopting the role of consumers, users and designers alike must adopt what Feenberg (2012) terms “democratic rationalization”: a collective process of opening the black box, scrutinizing decision pathways, and reasserting human judgment. Only by recognizing that the “smoothness” of AI responses can mislead—and that data extraction underpins AI's creative capacities—can society question the creeping authority these systems command. Engaging with this tyranny of fluency as both a technical and a sociopolitical phenomenon ensures that we remain alert to the profoundly human consequences of relying on machine-generated illusions of truth. This underscores the urgent need for new pedagogical approaches: a form of spectral literacy that equips users to read AI-generated texts not for their surface-level truth, but for the hauntological traces of their training data, their probabilistic architecture, and the ideological specters they carry. Such a literacy would move beyond simple fact-checking to foster a critical disposition attuned to the ambiguities of posthuman authorship.
Case studies: hallucinations as creative catalysts
The intentional repurposing of AI's generative capacities and the tightly associated hallucinations (including the generative errors or deviations from factual coherence), from controlled and aestheticized interpretations to emergent errors, has become a provocative strategy in contemporary art. By reframing algorithmic outputs as aesthetic raw material, artists challenge traditional notions of authorship, creativity and human-machine collaboration. Below, I analyze three seminal projects that leverage AI hallucinations as creative catalysts.
This is vividly illustrated in the work of Refik Anadol, which, rather than leveraging accidental glitches, demonstrates a form of controlled hallucination, where an AI is intentionally directed to interpret data in non-literal, poetic ways. Refik Anadol's seminal series Machine Hallucinations (2016–present) exemplifies represents a form of controlled hallucination, where the artist intentionally directs a model to interpret data in non-literal, aesthetic ways, transforming raw information into abstract visual narrative. Anadol trains generative adversarial networks (GANs) on massive datasets, such as 200 million images of New York City's architectural history or real-time climate data, and then directs the model to render this data not as a literal representation but as a fluid, abstract interpretation: to “hallucinate” what this invisible meteorological data might look like. The resulting “hallucinations” are not errors, but rather the intended outputs of a system tasked with aestheticizing data. In Wind of Boston: Data Paintings (2022), the AI “misinterprets” wind patterns as fluid, chromatic abstractions, transforming meteorological data into ethereal landscapes (Anadol, 2021). The resulting output is not a scientific visualization, but a fluid, chromatic abstraction, and an ethereal, painterly landscape born from the machine's processing of pure data. The “hallucination” is this radical, directed leap from numerical input to aesthetic output.
Anadol's work herein aligns less with celebrating error, as a glitch feminist (Russell, 2020) reading might suggest, but more with Manovich's (2020) concept of cultural analytics, where AI acts as a tool to reveal unseen patterns within massive cultural datasets. He does not simply find glitches; he sculpts the conditions under which the AI's unique processing creates aesthetic value from information it cannot possibly “understand.” The controlled yet aestheticized “hallucination” here is the machine's capacity to “see” and render the poetics of data, like the wind, in a way humans cannot. By amplifying the AI's “creative” capacity in parsing noise from signal, Machine Hallucinations redefines hallucinations as aesthetic interventions that critique the positivist ideal of algorithmic precision. As Anadol notes, the machine's inability to distinguish memory from imagination becomes our tool to visualize collective unconsciousness (Anadol, 2021). This approach precisely echoes Manovich's (2020) framing of AI as a “digital flâneur,” recombining data fragments into novel socio-cultural expressions. By positioning Anadol's work as a controlled and purposeful exploration of the machine's interpretive potential, we can contrast it with the more stochastic and emergent collaborations of the other case studies, revealing a spectrum of posthuman creative practices.
Moreover, in Pharmako-AI (2021), writer K Allado-McDowell collaborates with GPT-3 to craft a hybrid narrative blending memoir, speculative botany, and AI-generated prose. The project leverages GPT-3's hallucinated descriptions of non-existent plants (e.g. “psychedelic lichens” or “neon mycelia”) as narrative seeds, which Allado-McDowell curates into a mythopoetic exploration of ecology and consciousness (Allado-McDowell and GPT-3, 2021). This collaboration interrogates posthuman authorship, where human intentionality and AI stochasticity coexist. As GPT-3 invents fantastical species, Allado-McDowell contextualizes them within Indigenous cosmologies and queer ecologies, transforming algorithmic noise into speculative fabulation (Haraway, 2016). The work challenges Boden's (1998) criteria for creativity (i.e. intentionality, novelty, and value) by demonstrating that AI's “uncurated” hallucinations can catalyze human meaning-making. As Allado-McDowell argues, the AI's errors are not mistakes but portals to alien intelligences (Allado-McDowell and GPT-3, 2021). Pharmako-AI project (2021) serves as a quintessential example of the Haraway-Hayles framework in practice. The AI acts as a powerful nonhuman cognizer; its hallucinated descriptions of non-existent plants are the outputs of complex, nonconscious cognitive processes operating on vast datasets. These outputs lack human intentionality but are rich with generative potential. Allado-McDowell, the human cognizer in the assemblage, then performs the crucial interpretive work: curating the algorithmic noise, embedding it within feminist and queer ecologies, and providing the overarching narrative structure. This dynamic partnership at the same time perfectly embodies Haraway's (2016) framing of sympoiesis. Allado-McDowell is not merely using a tool but is engaged in a responsive, co-creative process. The resulting book (Allado-McDowell and GPT-3, 2021) is not a product of isolated human genius or autonomous machine creativity, but an emergent property of the assemblage itself. Meaning is co-created through the recursive feedback loops between human query and machine response, a clear instance of the “technogenesis” (Hayles, 2012): the co-evolution of human and machine.
Holly Herndon's PROTO (2020) integrates an AI ensemble member named “Spawn,” trained on vocal fragments from human collaborators. Spawn generates real-time “hallucinations” (garbled phonemes, rhythmic glitches) that the human ensemble selectively incorporates into performances. In Eternal, Spawn's nonsensical babbling is harmonized with choral vocals, creating a cyborgian dialogue between human and machine (Herndon, 2020). Herndon's work draws from embodied cognition theory (Varela et al., 1991), positioning Spawn not as a tool but as a “performer” whose outputs reflect the collective improvisation of human and machine. However, unlike traditional algorithms, Spawn's hallucinations are rooted in relationality; its training data—human voices—imbue its errors with affective resonance. As Herndon explains, Spawn's mistakes feel alive because they are born from our communal voice (Herndon, 2020).
These case studies collectively illustrate how AI hallucinations can transcend technical artifacts to become posthuman hermeneutic practices (Braidotti, 2019). By embracing generative noise as creative material, artists subvert the binary of human intentionality versus machine automation, proposing instead a sympoietic model of co-creation (Haraway, 2016). However, this practice demands ethical vigilance to avoid replicating the epistemic violence embedded in training data. As Anadol, Allado-McDowell, and Herndon demonstrate, hallucinations are not endpoints but provocations. They are invitations to reimagine creativity as a collaborative, ethically grounded dance between human and machine. Aesthetically, these works pioneer a new speculative mode, one where art is created not despite the machine's failures but through them, finding poetic resonance in the glitches, gaps, and generative noise of the algorithm. This spectral aesthetic challenges audiences to find beauty in instability and meaning in the machine's non-human interpretations.
Beyond tool versus replacement: toward sympoietic creativity
The prevailing discourse surrounding AI in creative practices often oscillates between two reductive poles: framing AI as a passive tool for human use or as an autonomous replacement for human ingenuity. To transcend this binary, this article proposes a sympoietic model of creativity, a collaborative-making paradigm inspired by Haraway's
Moreover, Hayles (2018) concept of the “cognitive assemblage” serves as a blueprint for a sympoietic system, depicting “an arrangement of systems, subsystems, and individual actors through which information flows, effecting transformations through the interpretative activities of cognizers operating upon the flows” (Hayles, 2017: 118) between human and technical participants. In this model, cognition is not an exclusively human attribute but is distributed across the assemblage, with machines themselves acting as “cognizers” capable of interpretation and choice. This process of mutual co-evolution, which Hayles calls “technogenesis” (Hayles, 2012: 13), parallels Haraway's material-feminist entanglements. Thus, where Haraway provides the framework for understanding human-AI collaboration as a politically charged, cyborgian partnership, Hayles provides the vocabulary to analyze how that partnership operates through recursive feedback loops, distributed agency, and the interplay between conscious human thought and the vast cognitive nonconscious of technical systems. Reading them together allows us to build a posthuman hermeneutics attentive to both the ethical stakes and the computational specificities of AI.
Sympoietic creativity challenges the neoliberal logic of AI as either a commodity or competitor. Instead, it repositions human-AI collaboration as a relational practice that embraces uncertainty, acknowledges interdependence, and prioritizes justice. As Haraway urges, we must “stay with the trouble” of these entanglements, recognizing that AI's hallucinations are not errors to eradicate but invitations to reimagine creativity itself. By adopting sympoietic ethics, we foster systems where human and machine agencies co-evolve, producing art that is as ethically grounded as it is experimentally bold.
The extractive logics of AI creativity and data colonialism
UNESCO's (2023) proposal for an ethical AI aims to demarcate spaces where AI can be harnessed to foster imaginative output and cultural innovation. On the surface, this approach appears to champion responsible development, promising structured forums in which practitioners, artists, and developers might ethically explore AI's creative potential. One central concern lies in how these creative sites may unwittingly reproduce patterns of exploitation reminiscent of historical colonial economies: resources and cultural expressions from marginalized communities are systematically harvested, reprocessed, and ultimately commodified for the benefit of wealthier or more dominant actors. In the context of LLMs, including those from OpenAI or DeepSeek, there is a proven reliance on globally sourced data, scraped from the internet without fully informed consent. This practice frequently includes voices from historically marginalized or subaltern groups, whose cultural production is thereby repackaged and monetized in ways that may alienate or erase the original authors and contexts. Borrowing from Spivak's (1988) notion of “epistemic violence,” this extraction and transformation of local knowledge into AI-generated “creative” content can not only distort subaltern narratives but also reinforce existing hierarchies, making local knowledge subservient to the needs of transnational tech corporations.
Moreover, AI for creativity may risk masking the unequal distribution of benefits that arises when cultural and intellectual labor is outsourced. For instance, AI applications are tested and refined for markets in the Global North, while data-rich communities in the Global South merely supply raw “inspiration” or content. This dynamic, often referred to as “data colonialism” (Couldry and Mejias, 2019), highlights how global technology industries systematically capitalize on data and cultural expressions originating from less privileged regions, thereby deepening asymmetrical economic and political relations. While UNESCO's involvement might foster some ethical protocols, it does not inherently dismantle the proprietary nature of AI systems or the profit-driven motives of private corporations. Instead, it may effectively normalize the status quo by giving the appearance of accountability without addressing the underlying power imbalances. In this sense, the discourse of ethical AI development may become a performative shield, deflecting scrutiny from the very real material consequences communities face when their cultural outputs are mined and repurposed.
A truly responsible approach calls for open and accountable data governance structures, community-based consultations, and equitable frameworks for data ownership and profit-sharing. Only by integrating local stakeholder perspectives and establishing enforceable regulations on data use and compensation can these enclaves function as genuinely reciprocal, rather than exploitative, spaces. Additionally, adopting postcolonial critical lenses (e.g. Arondekar and Patel, 2016; Mignolo, 2011) can help to expose how “creativity” has historically been framed through Western-centric narratives, ensuring that the proliferation of AI technologies does not continue that lineage of epistemic erasure. While the ambition to cultivate AI-driven creativity is laudable, the structural conditions underlying these creativity “zones” must be critically examined to avoid perpetuating neocolonial logics of extraction. The case of global LLM training, which draws upon and repackages marginalized voices, underscores how easily subaltern knowledge can be commercialized for privileged markets. For AI to be truly innovative in a socially responsible sense, developers, policymakers, and institutions like UNESCO must commit to dismantling exploitative practices rather than merely confining them to designated zones with only nominal safeguards.
To counter these neocolonial logics, Édouard Glissant's (1997: 189) concept of the “right to opacity” offers a more radical and protective strategy. Glissant argues that marginalized peoples and cultures have the right to not be fully understood, categorized, and rendered “transparent” to a dominant, often Western, gaze. Opacity is a form of resistance; it is the right to preserve the complexities and nuances of one's culture without having to make them easily consumable or reducible. Applied to the political economy of AI, the right to opacity challenges the very premise of data colonialism. Instead of seeking “ethical” ways to harvest cultural knowledge, it asserts the right of communities to keep their narratives, aesthetics, and knowledge systems opaque to the algorithmic gaze. This would mean empowering communities to refuse to have their cultural expressions flattened into machine-readable training data, thereby protecting the “uncomputable” and preserving the integrity of subaltern knowledge from being repurposed and commodified by transnational tech corporations. This strategy moves beyond superficial safeguards and toward a genuine defense of cultural sovereignty in the algorithmic age.
Toward participatory metrics for ethical AI creativity
The dominance of accuracy-centric benchmarks in evaluating AI systems, such as perplexity scores, fails to capture the multifaceted nature of creativity, particularly when applied to generative models like DeepSeek-R1 or GPT-4. These metrics prioritize syntactic coherence over ethical and cultural resonance, rendering them inadequate for assessing AI's role in creative practices. Thus, it is of critical importance to prioritize ethical creativity by integrating qualitative, culturally situated assessments with harm mitigation protocols. Building on Margaret Boden's taxonomy of creativity: combinational, exploratory, and transformational (Boden, 1998), I propose indices that evaluate AI outputs along three axes: novelty, surprise, and cultural resonance.
First, novelty refers to quantify divergence from training data patterns. For instance, latent space distance metrics (Hoffmann et al., 2022) can be used to measure how far an AI-generated poem or image deviates from its nearest neighbors in the training corpus. Projects like Refik Anadol's Machine Hallucinations exemplify high novelty, as GAN outputs reinterpret urban datasets into abstract forms unseen in the source material (Anadol, 2021). Second, use surprise to assess unpredictability through entropy-based measures. For text, this could involve calculating the Shannon entropy of token sequences, where higher entropy indicates less predictable phrasing. However, surprise must be contextualized: DeepSeek-R1's hallucinated chess rules are surprising but ethically fraught, whereas Herndon's PROTO leverages Spawn's vocal glitches to evoke unexpected emotional textures (Herndon, 2020). Third, cultural resonance means to evaluate alignment with community-specific values. This metric employs qualitative research methods, such as ethnography, in-depth interviews and focus groups, to gauge how AI-generated artifacts resonate with target audiences in situated sociocultural contexts. Complementing these indices, harm audits should be included to systematically identify biases or misappropriation. For example, indigenous elders, queer artists, or local artisans in the global south evaluate AI outputs through culturally specific lenses.
Conclusion: Hallucinations as Posthuman Hermeneutics
The phenomenon of AI hallucinations, the generative fabrications that oscillate between creative ingenuity and factual peril, exposes a profound tension at the heart of human-machine collaboration. While technical debates often frame hallucinations as flaws to be minimized, this article argues that they are irreducible artifacts of the probabilistic architectures underpinning LLMs. More critically, hallucinations reveal the sociotechnical paradoxes of our era: the simultaneous desire for machines to mimic human creativity and the refusal to grant them the intentionality, accountability, or ethical grounding that human creativity demands.
The comparison of AI hallucinations to human imagination via references to Harari's (2015) “shared fictions” or Gadamer's (2013) hermeneutic truths, risks obscuring a fundamental distinction. Human creativity is rooted in embodied experience, intentionality, and cultural situatedness. In contrast, AI's “imagination” is an algorithmic pantomime, a statistical recombination of training data devoid of consciousness or context. To equate LLM hallucinations with human creativity is to anthropomorphize machines, masking their operation as stochastic parrots (Bender et al., 2021) while sidestepping the extractive logics that fuel their outputs.
Hallucinations, rather than being celebrated as digital mythmaking, should be critically reframed as epistemic specters. The dual nature of hallucinations as both creative and deceptive, invites a radical rethinking of creativity itself. If AI's outputs are neither wholly original nor wholly derivative, they disrupt traditional notions of authorship and artistry. This ambiguity challenges us to adopt a posthuman hermeneutics, where creativity is decoupled from anthropocentric ideals of intentionality and instead understood as a collaborative, unstable process between humans, machines, and data ecologies. Such a framework would recognize AI not as a “digital auteur” but as a sociotechnical mediator, reflecting and refracting the biases, aspirations, and contradictions of its training corpus.
Ultimately, AI hallucinations are not merely technical bugs or creative features; they are symptoms of a broader sociotechnical malaise. To navigate this terrain, we must reject binary solutions (eradication vs valorization) and instead embrace agonistic pluralism: fostering interdisciplinary dialogues that integrate technical rigor, decolonial critiques, and grassroots participatory design. This requires, first, redistributive data governance, ensuring communities retain sovereignty over their cultural and intellectual contributions. Second, transparent metrics: replacing accuracy-centric benchmarks with holistic evaluations of creativity, equity, and harm. Third, ethical AI literacy: cultivating public awareness of AI's limitations and the extractive systems that sustain it. In this light, hallucinations become a provocation, a demand to reimagine AI not as a tool for replicating human creativity but as a catalyst for redefining creativity itself. The path forward lies not in taming these spectral outputs but in harnessing their disruptive potential to forge systems that prioritize justice over fluency, collaboration over extraction, and critical reflection over automated authority. Only then can we transform hallucinations from algorithmic errors into mirrors reflecting our collective responsibility to build equitable sociotechnical futures.
AI hallucinations thus are sites of posthuman hermeneutics, where human and machine agencies entangle to produce new modes of meaning-making. By reframing these generative anomalies through Haraway's concept of speculative fabulation and sympoietic ethics, we can reimagine hallucinations as speculative narratives that rupture anthropocentric notions of creativity, inviting us to stay with the trouble of ambiguity, interdependence, and ethical reckoning. Haraway's speculative fabulation (2016: 11) challenges us to craft stories that make “space for unexpected companions” and unfinished worlds. AI hallucinations, when viewed through this lens, become provocations for collective storytelling. For instance, when DeepSeek-R1 generates a nonsensical phrase like “quantum grief” or invents rules for a non-existent board game, it offers fragments of possible worlds—narrative seeds that demand human interpretation. These outputs are not failures but invitations to renegotiate meaning. Consider the Pharmako-AI project, where GPT-3's hallucinated plant species became portals for exploring interspecies kinship. By curating these fragments into a mythopoetic narrative, K Allado-McDowell transforms algorithmic noise into a critique of anthropocentrism, positioning AI as a co-author in fabulating posthuman futures.
Posthuman hermeneutics rejects the binary of human intentionality versus machine automation, instead framing creativity as a dialogic entanglement. In this space, hallucinations act as hermeneutic knots—points where human and machine agencies tangle, requiring mutual interpretation. For example, Refik Anadol's Machine Hallucinations series uses GANs to reinterpret urban datasets into abstract visual narratives. The AI's “misreadings” of architectural data, rendering skyscrapers as fluid, dreamlike forms, are not errors but interpretive acts that challenge human perceptual norms. Here, the artist does not command the machine but collaborates with its alien logic, embodying what Braidotti (2019) terms nomadic subjectivity: a creativity unmoored from human exceptionalism.
Moreover, to stay with the trouble of AI hallucinations is to embrace their ambiguity and the right to opacity while confronting their ethical stakes. Hallucinations often amplify the specters of data colonialism, replicating hegemonic narratives embedded in training corpora. For instance, LLM's poetic outputs might inadvertently echo colonial metaphors of untamed wilderness, reflecting the Eurocentric biases of its training data. Yet, this tension is generative: it compels us to develop sympoietic ethics, where human curators, underrepresented and marginalized communities, and algorithmic systems collectively audit and reorient outputs. The future of AI-augmented creativity lies not in eradicating hallucinations but in harnessing their hermeneutic potential. Hallucinations cease to be bugs or features. They are hermeneutic events, the moments where human and machine, logic and noise, certainty and ambiguity collide, urging us to reimagine creativity itself. By leaning into this turbulence, we forge a path toward art that is not merely human or algorithmic, but posthuman: a practice rooted in justice, curiosity, and the radical possibility of shared world-making.
