Abstract
Keywords
Introduction
This article unpacks the relationship between commitments of qualitative inquiry and the architecture and capabilities of Generative AI (GenAI) models to explore a promising space of possibilities. We focus on Large Language Models (LLMs) that are usable out-of-the-box without programming or fine-tuning as these offer most immediately accessible opportunities for researchers to strengthen core commitments of, and address longstanding challenges in, qualitative analysis.
At its core, qualitative analysis involves closely examining large amounts of rich data to generate insight, a notoriously time-consuming process, while LLMs offer powerful pattern recognition capabilities. An initial impetus might thus be to see how AI could speed up aspects of qualitative methods to make the process more efficient. This has been a central focus of the discourse to date (as documented in Paulus et al., 2025) with most work using LLMs to speed up thematic analysis of a dataset (e.g. De Paoli, 2024; Rientes et al., 2025; Yan et al., 2024) or more rapidly develop a codebook for deductive application (Barany et al., 2024; Gao et al., 2024) 1 . In both cases, important elements of interpretive judgment are ceded to AI (Paulus et al., 2025). In addition to study-specific uses of commercially available applications 2 there have also been early efforts to develop prompt frameworks (Zhang et al., 2023), analysis workflows (Bakharia et al., 2025; Rao et al., 2024), and LLM-powered tools (Lin et al., 2025) specifically tailored to qualitative research. The emphasis here is on fidelity to original sources, meaningful construction of analytic categories and assignment of data to them, completeness of analysis, and reproducibility of results.
While the majority of this work adopts a “human-in-the-loop,” rather than fully automated approach
3
, LLM use remains largely procedural, offering limited critical engagement with key conceptual dimensions of qualitative inquiry such as researcher positionality, relationship to the data and its generation, iterative refinement of questions as well as interpretations, and dialogue with theory. This may explain why, to date, the response of many qualitative researchers has ranged from marked reservation to outright rejection (e.g. Jowsey et al., 2025)
In this article, we offer a reframing, arguing that GenAI’s transformative potential for qualitative analysis lies not in automating tasks for speed or scale, but in supporting deeper engagement with core commitments of qualitative research and addressing persistent challenges in enacting them. We explore how fundamental characteristics of LLMs and defining principles of qualitative analysis offer productive overlaps, enabling new approaches for current practice and future inquiry. We begin by outlining some core commitments and ongoing challenges in qualitative analysis. We then provide a brief overview of LLMs and unpack key characteristics that can support efforts to meet these commitments. Finally, we provide an illustrative example of how LLMs can be mobilized to support robust qualitative analysis and outline initial expectations for conducting AI-in-the-loop analysis in ways that strengthen, rather than undermine, trustworthiness.
Qualitative Analysis: Key Commitments and Persistent Challenges
There are many variations of qualitative research; here we follow the tradition of Guba and Lincoln (1989), which is grounded in a
Qualitative analyses of this kind share a set of epistemological and ontological commitments concerned with the centrality of the human researcher as an important element of analysis
One aspect of subjectivity involves the researchers’ knowledge of and immersion in the context where data is collected. Understanding what is going on
Rather than seeing subjectivities as confounds to high-quality research, qualitative analysis emphasizes the ways they support a rich understanding of the complexity of contexts. This means it is possible to develop more than one valid understanding of the data, and that such variance can be both valuable and useful. As Madill et al. (2000) note, “the goal of triangulation is completeness not convergence….two models [resulting from analysis] demonstrate how researchers can provide complementary pictures of a phenomenon. The models are not incompatible but allow us to view the experience of participants from two different perspectives, both of which are justifiable” (p. 12). This is not to say that all analyses are seen as equally valid, or that any interpretation is reasonable. The warrants for claims, and the emphasis on interrogating one’s own subjectivity and how it influences what is seen in the data, are essential components of rigorous qualitative analysis (Greene, 2014).
To summarize, the key commitments of qualitative analysis described above are: (i) close attention to the details of the data in multiple iterations that results in layering of meaning; (ii) immersion and personal history with the data to attend to any piece of it with the larger context in mind; (iii) attention to the larger context in which the data was collected so that the meaning of utterances is understood even if not directly captured in the data; (iv) positionality with respect to researchers’ lived experiences and associated perceptions to support ‘noticing’ and conceptualizing different insights and interpretations. (v) including multiple researchers in conversation with the data and with each other in ways that support different interpretations to emerge.
Enacting these commitments is central to the qualitative analysis process, but not without its challenges. Completing multiple iterative rounds of analysis of the full set of data (i) is time consuming, thus in practicality researchers often focus on subsets of the data that seem of particular interest. Likewise, keeping
Properties of LLMs that Support Efforts to Meet These Commitments
A Brief Overview of Large Language Models (LLMs)
LLMs are initially built by learning information patterns through a
When used by a researcher to analyze data (e.g. by sending a message or “prompt” to the model) the model breaks down the submitted text into tokens (words, subwords and special markers
5
) and represents them in its working memory, generally referred to as its “context” (referred to here as “model context” for clarity; Zhao et al., 2023). The model context cumulatively includes everything the researcher submits to the model as well as all of the model’s responses. Maximum model context size, which determines how much information the model can process at once, currently ranges from several to many hundred thousands of tokens. For example, a researcher might submit a transcript of a classroom interaction and the request to identify all potential instances of scientific engagement. Each token (from the transcript
The positioning of each token in the embedding space occurs through an iterative process informed by its intrinsic meaning, the model’s learned knowledge from training, and its relation to all other preceding tokens through a process called “attention” (Vaswani et al., 2017) For instance, when analyzing a classroom transcript, the model would recognize that the word “challenging” needs to be positioned differently in the space depending on whether it appears in a positive context (“rewarding but challenging”) or a negative one (“too challenging”). The ability to distinguish such nuanced meanings helps the model respond to questions about the text in ways that reflect the subtle differences in how words are used in different situations.
In producing a response to the researcher’s request, the model generates tokens one at a time in an
Key LLM Properties with Relevance for Qualitative Analysis
Having briefly outlined how LLMs function in use, we now highlight key properties of LLMs that offer promise for addressing challenges and realizing opportunities for qualitative analysis.
Large-Scale Pre-Training
LLMs incorporate broad conceptual representations learned during training across trillions of tokens. Rather than losing information through approximation or averaging, the training process is used to create and refine a representational space that provides support to encode varied information richly, including multiple languages, viewpoints, cultural understanding, and nuances of meaning.
Large-scale pre-training situates analysis processes within broader socio-cultural contexts learned by the model, supporting contextually grounded interpretation of an utterance’s meaning, even when it is not explicitly present in the data, by drawing on similar utterances encountered during training (commitment iii) 6 . In addition, researchers can elicit particular perspectives, stances or theories represented in the model to support surfacing different kinds of insights and interpretations (commitment iv) and put such perspectives or the insights arising from them in conversation with each other (commitment v) 7 .
Attention Mechanisms
LLMs continually reinterpret the elements of the model context (i.e. adjust the representation of tokens) in consideration of all other preceding tokens through attention processes as new information is added to the model context. In this process, all parts are recontextualized (all tokens are re-represented) based on the changes the new information brings.
This enables the model to focus on specific details within the data while maintaining attention to the broader context (commitment ii). Additionally, the model can process the data iteratively, continuously refining its representations as it integrates new information and deepening reflections from the researchers (e.g. input as memos or used to frame prompting), allowing for the layering of meaning over time (commitment i). This is useful to help surface subtle patterns, recurring themes, and contrasting details that can support nuanced interpretations.
Rich Embeddings
LLMs iteratively use attention processes and the model’s learned knowledge from training to represent words, sentences and documents in the embedding space in a way that captures semantic relationships as directional distances between tokens.
This provides a foundation for the analysis of themes in the data, represented as both large- and small-scale patterns in the information’s representation, which can be explored in multiple iterations (commitment i) and from multiple perspectives (commitment iv). In addition, when working in partnership with data scientists these embeddings can also be examined directly to probe the organization of the underlying data representation.
Auto-Regressive Nature
When composing a response to user input, LLMs generate tokens one at a time using a set of probabilities based on all prior tokens (i.e., the model context built up from all researcher inputs and the model’s previous responses). Model “temperature” is a parameter that controls the variability of the model’s output, adjusting between more deterministic (most probable tokens favored) and creative (a wider range of token probabilities is sampled from) responses 8 .
LLMs can produce varied responses to the same input, recombining and reframing all the information in their model context to offer multiple alternative interpretations. This allows for iterative and dynamic engagement with the data, enabling varied interpretations to emerge and be put in conversation with each other (commitments i and v). Adjusting the temperature, in particular, adds flexibility to analysis; while lower settings reinforce consistency, higher settings can help surface alternative interpretations, akin to how different researchers notice distinct aspects of the data (commitment iv).
Long-Context Capabilities
LLMs now support the analysis of multiple long documents and provide the capability for coding them interactively (in-context learning), allowing for rapid iteration of nuanced analyses. For example, at the time of writing, common model context sizes available range from 128,000 tokens (OpenAI’s GPT5 via ChatGPT) to 1 million (Google’s Gemini 3) providing the ability to represent approximately up to 300 to 2,500 pages of text in memory at once. Multimodal models, such as Gemini, that accept video as input can currently process about an hour of video at a time, with expectations that this capacity will be expanded.
It is possible for a LLM to hold the entire textual corpus of data collected in a study (e.g. classroom and interview transcripts, researcher memos, etc.) in its model context at once. This supports identification of potential patterns, often subtle and complex, across a large and diverse collection of data that might be challenging for humans to notice manually. It also allows researchers to probe and unpack patterns noticed in one part of the dataset while keeping the larger context of the full data set in mind (commitment ii). In addition, the flexibility to iteratively explore data through both small adjustments and substantial shifts in perspective supports repeated engagement with the entire dataset across multiple passes, helping researchers evolve, refine, and deepen their interpretations over time (commitment i).
Prompting LLMs to Meet the Commitments of Qualitative Analysis
The properties described above are leveraged when researchers engage with the models, which occurs through
Other prompting techniques are useful for encouraging close attention to the details of the data (i). For example,
While prompting can refer to any text sent to an LLM, a special kind of prompt is a system instruction: directives given at the start of interacting with a model that set its overall behavior, tone, or role throughout the interaction. In qualitative analysis, system instructions can guide the model to maintain a consistent analytic posture throughout the process. For example, a system instruction might direct the LLM to focus closely on linking claims to supporting evidence (e.g. “You are a rigorous qualitative researcher. When suggesting interpretations, you always ground them in direct excerpts and clearly explain how each piece of data supports the claim”). System instructions should not be used for elements of the analysis that need to remain flexible or evolve throughout the process.
Mobilizing LLM Properties for Robust Qualitative Analysis
There are many ways that qualitative researchers can thoughtfully make use of LLMs as part of an AI-in-the-loop analytic process, while also staying true to core commitments of qualitative analysis. We frame such activities as
Trustworthiness
Although the field of qualitative analysis encompasses a range of perspectives on indicators of analytic quality, we focus here on the criteria that are most consistent with the relativist paradigm (Guba & Lincoln, 1989; Lincoln & Guba, 1985): credibility, dependability, confirmability, transferability, and authenticity. Briefly,
Using AI to Support Trustworthy Qualitative Analysis
As a means of illustrating an LLM-supported qualitative analysis workflow, we offer a hypothetical example of studying science talk in an elementary school classroom. The primary data are transcripts of whole-class conversations from the class’s science block, which took place over two months and focused on the life cycle of plants. Data from whole-class discussions were supplemented with reflective interviews with the teacher and each of the children, as well as with researcher observational memos. The overarching research question addressed is:
The LLM-supported analysis workflow begins by establishing the model context with all relevant information available for analysis. Depending on the AI tool used, this might involve initiating a persistent chat session that can be returned to later, or creating a project space to store chats, files, and system instructions together. For research data, it’s also essential to choose a tool that meets the security and privacy requirements of the study. Relevant information includes the body of data collected through a study, as described above, and potentially also additional documents for contextualization. For example, researchers’ personal history with the data might be represented in positionality statements and an overview of the data collection schedule, while information about school and classroom histories and the context in which the data were collected could include demographics, documentation of the teacher–researcher collaboration, and details about the curriculum in use (e.g., Plants in Action).
Capitalizing on the model’s
Once model context is established,
At this point, the LLM-supported workflow has surfaced a set of potentially interesting excerpts or patterns from a large dataset for further investigation, initial ‘needles in the haystack’ chosen in relation to the broader data, suggesting the potential beginnings of meaningful themes. Such patterns can be explored in multiple ways, setting the stage for
Establishing grounds for confirmability requires thinking carefully about how the personas and tasks brought to bear might reveal or hide aspects of the data, and explaining those decisions and their iterative output fully in the findings. For example, the methods section might include details of the iterative prompting that was used and the different kinds of excerpts that were identified based on those prompts. These rounds of analysis also offer an additional mechanism for probing confirmability beyond what is typically feasible in qualitative research (e.g. triangulation across researchers, member checking with participants). Here, the model can be prompted to review the full dataset to surface both confirming and disconfirming instances for emergent conjectures, thereby allowing the researcher to more robustly explore and stress-test findings.
Such cycles of prompting further enable researchers to take seriously the importance of
Conclusion
While GenAI is often seen as a tool for automating or expediting analysis, here we have explored its deeper potential to engage meaningfully with the core commitments of qualitative inquiry and help address long-standing challenges in enacting them. However, while AI-in-the-loop analysis offers new possibilities, it also demands standards for rigor. We propose as a starting point that analyses that do not seriously engage with the foundations of qualitative research and document their processes for doing so are generally not high-quality. Researchers can provide evidence for how their analysis meets criteria for trustworthiness such as credibility, dependability, confirmability, transferability, and authenticity, through both established and emerging practices. For example, searching for confirming and disconfirming instances is a powerful means of supporting confirmability that can be carried out much more extensively with AI. Novel approaches for meeting existing trustworthiness criteria are also possible, such as an AI-supported temporal audit that examines how the qualities of a theme evolve over time to strengthen dependability. In addition, expanded criteria for trustworthiness may be needed, particularly
We close by drawing attention to persistent evidence that GenAI systems reflect the dominant cultural perspectives embedded in their training data and can reproduce problematic human biases even after explicit training not to (Bai et al., 2025; Hofman et al., 2024). A core strength of qualitative analysis is its intentional engagement with subjectivity. Extending this stance into work with GenAI offers a powerful means to surface and scrutinize assumptions that shape model outputs. Just as qualitative researchers interrogate their own positionality, so too must we interrogate, and work to reorient the cultural logics embedded in AI models. When engaged critically, GenAI can also be used to support this reflexive work, for example by being prompted to adopt explicitly critical, bias-aware analytic stances and to make patterned forms of bias in analytic outputs visible. Researchers can also prompt GenAI to help them reflect on their own assumptions by posing questions relevant to the research site, data, and their personal histories, enabling deeper interrogation of positionality throughout the analytic process. By taking shared responsibility for how AI is used in analysis, qualitative researchers and AI developers can work together to advance “human + AI” analytic practices that support the development of trustworthy, context-aware insights from large-scale data.
Footnotes
Acknowledgments
Generative AI (ChatGPT, GPT-5) was used to refine wording and formatting at the sentence level or below; all ideas and arguments presented are the authors’ own.
Ethical Considerations
Conceptual contribution, ethical approval was not required.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
