Abstract
Keywords
Introduction
Artificial intelligence (AI) is changing industries and occupations and the way we think about a range of human activities, including qualitative research. Recent research has highlighted the capabilities of AI tools to enhance the efficiency and, arguably, the robustness of the research process, but also the threats to research integrity and the critical thinking ability of researchers (Anis & French, 2023; Christou, 2023a, 2023b). This paper offers an exploration of AI use across the constructivist-interpretivist paradigm, and its implications for human centred, text-based research.
Focusing specifically on the core stages of qualitative data analysis – from data coding to its end point of inductive/abductive theory development (Saldaña, 2021) - this paper reports on a qualitative experiment in inter-coder reliability (ICR) between human researchers and a human-AI collaboration. Raw data from an existing qualitative project is used to test the efficacy of AI as a short-cut to the laborious process of human-based qualitative data analysis. The topic of the original study is immaterial to the experiment itself as it is the process of data analysis rather than the data
The novel contribution of this paper lies in its qualitative experimental assessment of the efficacy of AI programs as data analytical tools. This extends the established norms of inter-coder reliability used by teams of qualitative researchers to that of non-human platforms operated by researchers, and in doing so, provides a pathway for the assessment of the confidence (or otherwise) of generative AI tools within human-AI collaborations. ICR is claimed to be good practice in qualitative research, enabling systematicity, communicability, consistency and transparency within research teams (O’Connor & Joffe, 2020). When ICR is systematically applied to data sets coded separately via human researchers and human-AI collaborations as is the case with this study, the usefulness of engaging with AI tools in textual data analysis can be established.
This paper is not a ‘how-to’ paper designed to improve researchers’ skills of using AI to better accomplish specific research tasks, nor does it offer a comprehensive performance test of all available AI applications and capabilities. Rather, this paper reflects on the experience of qualitative researchers engaging with AI in human centred research and on the related epistemological questions it raises. Qualitative research is directed at comprehending human experience and meaning, and it does so through specific approaches, including the constructivist-interpretive paradigm, that seeks to understand social reality through interactions between the researcher and researched (Crotty, 1998; Neuman, 2014). As such, this paper poses the following research question:
This paper is structured as follows: A review on recent and relevant publications on AI and qualitative data analysis provides background information, exploring key debates and research gaps in this emerging field. This is followed by an overview of the role of human experience within qualitative research epistemologies and inductive/abductive data analysis that leads to its natural endpoint of theory development. The methodology outlines the key steps undertaken, highlighting the ways that AI was ‘coaxed’ by human researchers prompting the AI platform to perform deeper and more comprehensive analyses. The subsequent section describes findings from the research process to illustrate the researchers’ experience of using AI tools. The paper ends with a discussion and conclusion of broader implications of AI use in qualitative, human centred research and the constructivist-interpretive approach more broadly and the proposal of a three-step framework to establish rigour in human-AI qualitative research.
Background: AI and the Human Experience
AI has entered the scholarly world with gusto. Even if we so desired, it would not be possible to put the genie back in the bottle. Students have embraced AI platforms powered by machine learning models such as Grammarly and ChatGPT to assist with assignment writing, and a number of AI platforms are available to search literature and generate literature reviews by analysing scholarly papers and identifying key themes. Despite concerns about ethics, integrity and authorship, AI platforms are proliferating, altering the direction of education and research (Hosseini et al., 2023).
Within the burgeoning literature on AI and how it transforms traditional human activities and occupations, research on the implications of AI for qualitative analysis and inductive/abductive theory development has also begun to surface (Christou, 2023b; Hosseini et al., 2023; Tabone & De Winter, 2023; Van Manen, 2023). In terms of its application in scholarly research, AI is still considered a novelty, and numerous papers have explored the benefits and shortcomings of its use. An important expected benefit is that of efficiency. Qualitative research is considered highly labour intensive and repetitive, and AI is seen as expediting specific tasks such as transcription, translation, coding, and organising of data, which can reduce the effort and resources required in research (Christou, 2023a, 2023b; Davidson, 2024; Gohil, 2023; Jiang et al., 2021; Pattyn, 2024; Rietz & Maedche, 2021). AI capability to process data faster and more efficiently also enables research to process larger data sets, including publicly available online and free-text data as well as non-textual data such as images, video, or visualisations (Anis & French, 2023; Ciechanowski et al., 2020; Player et al., 2024).
Arguably, AI’s ability to access and process large sets of data may enhance generalisability, objectivity and robustness of qualitative research as some have proposed (Ciechanowski et al., 2020; Davidson, 2024; Tschisgale et al., 2023). Human-based data coding is often criticised as unreliable, ad-hoc and subjective, whereas proponents of AI enabled tools welcome the prospect of a more consistent and reliable coding process - particularly by reducing human errors and so-called researcher bias (Jiang et al., 2021). Furthermore, automated coding could represent an opportunity to increase trustworthiness by applying multiple data analysis methods (Gohil, 2023) and traceability by creating an audit trail that can facilitate verification and reproducibility of qualitative analytical outcomes (Davidson, 2024; Tschisgale et al., 2023). However, as we argue later, these claimed benefits are at odds with the constructivist epistemological and interpretivist theoretical basis of much qualitative human-centred research, which embraces researcher subjectivities and bias through well-established reflexive models of research (for example, see Crotty, 1998). Rather, claims of enhanced robustness and generalisability layer a more positivistic epistemological approach which presumes ‘verifiable, objective truths’ rather than socially constructed ‘realities’ which is intrinsically interpretive.
Despite the chat box interface and dialogue style which promotes personification and anthropomorphisation of many AI tools (Anis & French, 2023; Davidson, 2024; Van Manen, 2023), AI is not analogous to humans with dispositional character traits, personal concerns and a unity of experience and purpose that bestow a unique and relatively stable identity (MacAdams, 1995). Rather, AI programs could be described as exceptionally large aggregation of data offering a vast array of predefined statistical and algorithmic patterns that can be activated within a wider human-machine interaction (Christou, 2023a; 2023b). Accordingly, AI cannot be considered an author or co-author for the purpose of scientific publications or research ethics (Van Manen, 2023).
Even so, research in this field has emphasised that AI models are not impersonal, objective and unprejudiced as they are aggregated in non-random ways that are likely to reflect ideas and structures that are prevalent in society and hence embedded in their underlying training data. This causes AI models to exhibit structural bias relating to race, colour and gender (Anis & French, 2023; Christou, 2023a). The issue of inherent bias is further aggravated by a general lack of transparency around sources used to train AI, algorithms, and the way that AI models produce outputs, which researchers refer to as “black box effects” (Jiang et al., 2021; Rai, 2020; Tschisgale et al., 2023; Van Eschenbach, 2021). Indeed, it is considered impossible to fully inspect the information construction process and conceive of an unbiased state or raw default assumptions of an AI model (Bommasani et al., 2023; Van Manen, 2023). Due to the dynamic and continuous aggregation of training data underlying the model and the intransparent instructions of different algorithms, the output from AI models is unpredictable and it is not foreseeable when and how it will produce confabulations or hallucinations not supported by ‘real’ data (Barrasi, 2024; Davidson, 2024; Player et al., 2024).
While AI responses may appear as the result of human-like cognition, their generation is based entirely on predictive probability of language use rather than meaningfulness (Gigerenzer, 2023; Van Manen, 2023). Yet meaning reflects the very basis of qualitative research. For example, researchers have raised doubts on the ability of AI models to understand words in their contextual and tacit meaning, which may include cultural nuances, indexical meanings and contextual speech acts (Anis & French, 2023; Christou, 2023b). This raises the epistemological question how human meaning can be represented through probabilistic and predictive language generation rather than interpretation based on actual embodied and lived experience (Frank et al., 2024; Sandelowski, 2002), and how machine interpretation impacts the authenticity, trustworthiness and reliability of qualitative theory development.
Recent research on AI and qualitative analysis has concluded that the utility of AI is best realised within human-AI collaborations that complement rather than substitute human interpretation capability (Hamilton et al., 2023; Lieder & Schäffer, 2024; Morgan, 2023; Pattyn, 2024). This collaborative view of AI tools, however, requires further investigation of a formal context of epistemological justification how AI use meets and aligns with specific research paradigms and objectives (Christou, 2023a, 2023b; van Dis et al., 2023). Addressing this important research direction this paper explores qualitative ‘interpretation’ not as a generic human ability but as a set of specific research tasks within a coherent epistemology, which are completed in parallel by human and non-human researchers.
The next section offers a brief overview of qualitative research paradigms and the role of human experience in qualitative research.
Constructivist-Interpretive Epistemology and the Human Experience in Data Analysis for Theory Development
Drawing on extant literature, this section presents key characteristics of qualitative research in relation to data analysis and the generation of theory from that data (see Bingham, 2023). While quantitative research, and the objectivist epistemology within which it is grounded, presents a homogenous and uniform reality of natural and social phenomena based on quantifiable relations and variation, qualitative research aims to elicit in-depth understanding of socially constructed reality/realities and the human lived experience. Due to the diversity of multiple philosophical stances and epistemologies, discussions on qualitative research must account for specific epistemological paradigms that differ in their degree of emphasis on the human lived experience and meaning (Creswell, 2007; Crotty, 1998; Layder, 2014; Mackenzie & Knipe, 2006; Sandelowski, 2002).
Qualitative social research aims to understand human experience within the multiple realities and social contexts of an everyday lived world or “life world” (Brinkmann & Kvale, 2019, p. 2), which are experienced through authentic human interaction (Messner, 2021, p. 47; also see Creswell & Poth, 2018). Rather than controlling for the complex relationships that give rise to a diverse social reality, qualitative inquiry seeks to comprehend the irreducible presence, authenticity and diversity of human experiences and engage with them exhaustively (Crotty, 1998; Layder, 2018). This has been referred to as the ‘constructivist-interpretive’ research paradigm (Crotty, 1998). Epistemologically, constructivist research does not perceive of data as existing out there to be discovered as a singular objective reality (Sandelowski, 2002), rather it proposes a process of inquiry that is fundamentally dependent on the presence and involvement of researchers, who “construct meanings as they engage with the world they are interpreting” (Crotty, 1998, p. 43). The objective of the constructivist paradigm is to elicit a maximum of diverse perspectives and observations, and to ensure they are analytically significant as empirical and conceptual indicators of the phenomenon under research (O’Connor & Joffe, 2020).
The empirical data generated through interaction fundamentally depends on prior human experience of the researcher and the researched and on the nature of the interaction itself. It is therefore a critical aspect of qualitative research to reflect on the positionality and relation of the observer and the status of the data, especially for research aimed at precise and rich experiential content (Borrego et al., 2009, p. 56). This concept has been termed ‘researcher reflexivity’ (Crotty, 1998).
Methodologically, the constructivist paradigm involves interactive research procedures, including theoretical sampling, data collection adjustment in line with emerging insights, iterative and abductive analysis in line with emerging theory, and researcher reflexivity (Layder, 2018; Messner, 2021). The steps of the research process are iteratively and purposefully continued until empirical data redundancy occurs, which means a maximum of observations has been recorded, and theoretical propositions have clearly emerged (Figure 1). Constructivist-Interpretive Inquiry is Dependent on Researcher Involvement. Source: Authors based on Crotty, (1998); Layder (1998, 2018)
Constructivist procedures of the purposive collection of empirical data, as outlined earlier, are designed to establish clear linkages between data and theory. Human directed interpretive analysis further reinforces these linkages as it does not only include an integration and conceptualisation of codes that are descriptive and manifest in the data, but also codes that are implicitly or latently present and relate to meaning, claims or presumptions (Adu, 2019, p. 28). Analytical coding is thus not equivalent to simply reducing, tagging or excerpting data but represents an interpretive act (Saldaña, 2021).
Qualitative theorising can be described as sensemaking that includes identifying and explaining phenomena and making judgements and more general assertions about social reality (Byron & Thatcher, 2016; Langley, 1999). There is a strong emphasis in the literature that inductive theory development must maintain clear linkages to empirical data. For instance, Sandelowski (2002, p. 111) calls for “full bodied qualitative work” that generates theory grounded in a body of relevant empirical data, while others see clear empirical indicators in the data as “support for interpretation of explicit and implicit meaning” (Adu, 2019, p. 27) and an “evidentiary warrant” to support and illustrate theoretical propositions (Saldaña, 2021, p. 20). Thus, a key aspect of qualitative theorising is the conceptual engagement with the empirical data. As described earlier, the constructivist-interpretive research paradigm perceives of theory development as the result of a process of conceptual engagement at all stages of the inquiry, which Layder (1998) has referred to as ‘data analysis with theory in mind’ (also see Figure 1). Constructivist approaches purposefully generate empirical data to offer a more general account of social reality based on the interplay of empirical data,
Methodology
This research extends scholarship on the use of AI technology in qualitative data analysis and theory development and the broader implications for epistemological robustness. The guiding
During the data analysis phases, two qualitative researchers collaborated with a computer scientist to explore the efficacy of AI assisted coding and analysis. Following Bingham (2023), the methodological design is systematic, organised and iterative and detailed below to meet the multiple criteria of credibility, dependability, consistency, and confirmability. The key steps of this process are as follows: (1) (2) (3) (4) (5) (6) (7)
This overall approach to human-AI inter-coder reliability aims toward broad alignment regarding how the data is classified and synthesised toward conceptual or theoretical categories whilst allowing for interpretive flexibility in accordance with the epistemological norms of social constructivism (O’Connor & Joffe, 2020). Using ICR to examine the congruence between human and human-AI data analytical output offers a pragmatic approach to assessing what human-AI collaboration really offers to researchers. In practical terms, it involves a step-by-step process of comparing each iteration of human-AI analysis with the previously generated human output, followed by further coding, prompting and ‘coaxing’ to establish the degree of agreement between the two sets of output. This process does not involve quantitative measurements of reliability as is the case with objectivist-based research (Cofie et al., 2022), rather it is an exercise in ‘researcher interpretation’, reflecting on the status of the data and the researchers’ confidence in the process of construction and interpretation of experiential content (Borrego et al., 2009; Crotty, 1998).
Findings
The findings describe three points of comparison (see Methodology point 7) that may impact researcher confidence in the status of data, analysis, and the credibility of conceptualisation given the constructivist epistemology that underpins the original research. The findings must be considered from a viewpoint of researcher reflection on trustworthiness and credibility of the analytical process and emerging theory (Crotty, 1998).
Uncertainty of Empirical Data Boundaries – What Data are Being Analysed?
AI supported analysis revealed uncertainties about which data was used to generate and support analytical responses. For example, after uploading a data file on ‘decision making’ a general query to provide key ideas from the file resulted in a list of bullet points that appeared generic. The query was repeated with an added prompt to provide key ideas and references specifically from the uploaded file only. In this particular case, the AI response even explicitly confirmed that the response had integrated insights from the uploaded document, while the outputs were in fact practically identical across the specific and general queries. This suggests that AI responses may be drawn from a wider pre-existing training data pool and only partially from the data set uploaded by the researchers. The further prompt, “Are you analysing the seven documents or also other data” was answered by “My analysis is based solely on the content of the seven documents you provided”. At the same time, the AI program referred to a code for ‘blockchain’ despite its absence from the research data. After prompting the model to provide a specific reference in the data file for ‘blockchain’, it replied, “It appears that the documents provided do not contain specific quotes about blockchain”. In computer science disciplines, this has been referred to as ‘hallucinations’ or ‘confabulations' (Barrasi, 2024; Davidson, 2024).
Another uncertainty arose from the fact that AI generated responses did not appear to cover and include all empirical data files exhaustively. In one case, when prompted to provide specific quotes to illustrate the codes of “business resilience” and “productivity”, the eight quotes given in response were drawn from only two of the 19 uploaded datafiles. A follow-on prompt to “identify new and different quotes to illustrate a code from all data files” still re-produced and repeated the same original responses. Combining all data into a single document rather than analysing multiple documents did not change the results as the output appeared limited to a constant size irrespective of the number of data files. This revealed potential limitations of the AI model’s ability to process and analyse multiple individual data files comprehensively, even in paid versions of the model.
The human generated data set in NVivo clearly indicated the coverage of codes, i.e. the number of files the codes are drawn from and the number of references linked to the occurrence of each code, which makes it easy to assess the exhaustiveness of analysis. In one instance, exploring data files, the AI program generated 10 codes including 4 repeats versus 47 researcher defined codes from manual human coding. In this case, further human coding was needed to bring the AI generated output into agreement.
To summarise, the researchers using a human-AI approach were not able to guarantee that the findings presented by the model gave an exhaustive account of all empirical research data files, nor that responses were definitely grounded in the relevant empirical data
Uncertainty of Codes – Relevance and Completeness of Concept Indicators
Qualitative analysis comprehends human lived experience through analytical codes, which represent an empirical data frame illustrative of the phenomena under research. The closeness of connection between researcher and data and its authenticity are diminished when empirical data files are analysed and interpreted without prior researcher involvement in the data collection process, as is the case with AI models that do not participate in the construction of data (Christou, 2023b, p. 2745). However, the quality of connection between data and concepts can still be assessed through the completeness and apparent relevance, or analytical significance, of codes.
The research findings indicate significant uncertainties related to the relevance, completeness and clarity of concept links provided by the used AI platforms. For example, AI based coding provided either very specific or very general codes. This type of technique has been described in the qualitative methodology literature as “splitting” and “lumping” and is a process of analysis conducted by researchers to organise data into codes, themes and categories by experimenting with different configurations and data abstraction (Saldaña, 2021). The experimental AI analysis generated broad themes and some underlying codes from data files, such as ‘productivity’ and ‘business resilience’. Initially, these themes lacked clear and specific empirical indicators. Researchers devised a series of prompts, such as “can you provide more detail” and “can you give a quote as an example”, to break the themes into codes and sub-codes and ultimately to draw out the concrete experiential observational data.
This approach to coding is reverse to the common inductive coding process that begins with a concrete observation or research participant’s expression and step by step, iteratively, raises the level of abstraction to support broader themes. Rather than coding, the research team termed this process as “coaxing”, a form of creative prompting aimed to draw out and re-assemble the underlying lived experience. This process of inverted inductive analysis supports similar observations in existing research (e.g. Morgan, 2023, p. 3) and raises questions of completeness and exhaustiveness of such retro-fitted codes, which would be difficult to determine in the absence of prior knowledge of the data.
Compared to the coding frame completed by human researchers, the codes selected and provided by the AI model were not exhaustive, nor did they represent the most relevant or illustrative codes to exemplify emerging concepts. Comparison with human generated codes within this research showed that AI generated codes were less illustrative, more random and occasionally simply irrelevant or false, requiring further human intervention and prompting. For example, when prompting for more concrete meanings and observations of the In Vivo code “headspace”, coding categories were linked to the use of certain verbal triggers such as ‘mental’, ‘mind’ or ‘mindset’, leaving aside other conceptually, but not linguistically, linked relevant codes, such as ‘ability to control’ or ‘virtual communities’. While such issues can be remedied through further human input, coaxing, and training, they are important when qualitative researchers come to reflect on the credibility of their findings from human-AI collaboration.
Such doubts of significance and relevance of conceptual indicator links pose significant challenges for the credibility and trustworthiness of inductive/abductive theory development from empirical data. From a perspective of inter-coder reliability between human and machine, it demonstrates that AI coding is not currently a quick fix to the laborious task of human coding and still requires a substantial amount of human input, including into the verification of completeness of experiential observations.
Theory Development – Dis-Experienced and Ground-Less Concepts?
Given the recognised limitations of theory development in AI assisted data analysis, the research team set out to ‘train’ an AI program by uploading two academic papers that outlined the theoretical concept of ‘relationality’. Following prompts, such as “How do you apply theory to empirical data” and “summarise the two documents and provide key ideas of the relational perspective” the AI program generated responses clearly outlining the meaning and steps of ‘applying a theory in research’. Furthermore, it identified a reasonable range of key elements and terminology drawn from the two training papers that outlined the theoretical perspective of relationality, including “relational dynamics” or “agency and interdependence”, recognising the importance of “dynamic interactions and the collective nature of processes that shape farming practices”.
Despite the outward impression that the AI models achieved an ‘understanding’ of relational theory, the application of theory to specific data sets raised questions regarding the empirical grounding and conceptualisation. In terms of the links of output to specific empirical data, familiar issues identified in the previous sections surfaced again. For example, prompting the model to apply relational theory to three different data files representing codes (‘Digital Tools and Data’, ‘Evidence and Assessment’, ‘Headspace’), yielded almost the exact same output each time, offering three identical conclusions that began with “Applying the relational perspective to document 4 (document 5 / document 6)…” . While the AI model confirmed that it did indeed apply relational theory to three specific data files, its conceptual output did not reflect the differences in content analysed.
Another issue was conceptual output of generic insights that were not present in the research data. The replies appeared sensible and coherent based on a general knowledge and understanding of “relational theory” but not based on conceptualisation of specific data in the uploaded files. The prompt, “apply relational perspective on how digital technology improves or reduces farm resilience” resulted in useful categories such as “community and peer support”, illustrated by an empirical indicator “digital divides or lack of access to technology can isolate some farmers”. This general statement would need to be conceptualised based on concrete research data, for example discussing how specific observational codes support the theme of ‘isolation’ of farmers, or what isolation as an experience meant to farmers who spoke about it. To achieve a conceptualisation of specific empirical data would require a significant level of added human input, including to verify that the examples used were actually present in the research data. In this case, ‘digital divide’ or similar concepts were never even mentioned in the data.
In terms of conceptualisation, applying relational theory to the data set would require identifying, supporting and illustrating relational themes inherent in the data, rather than just listing generic points. However, the responses resembled a summary of key points rather than the result of drawing on and synthesising the empirical research data and theory to progressively lift concepts to more abstract levels of understanding. For example, three separate themes produced by the AI model were listed as “Headspace and Mental Resilience”, “Impact of Drought on Mental Health”, and “Importance of Mental Preparation”. These themes are related, and human analysis would typically synthesise codes of such similarity, in this case “supporting mind space and mental wellbeing” (Richards et al., 2025, p. 60), that encompasses all aspects of mental health in a single higher-level theme.
The more comprehensive conceptual tasks specified in the literature on theory building, such as envisioning new concepts, explicating existing concepts, relating patterns and critically debating the evidence (McInnis, 2011, p. 40), were not visible in the AI generated responses. For example, in response to the prompt: “apply a relational perspective on how digital technology improves or reduces farm resilience” the AI model provided the theme “Enhanced Monitoring and Response”. This theme was supported by an empirical indicator of the relation between producer and technology, given as “Digital tools such as sensors, drones, and satellite imagery provide real-time data on soil moisture, crop health, and weather conditions. This enables the farmer to respond promptly”. This is a very generic empirical indicator on the use of technology on-farm, and it was also not directly drawn from specific data in the documents provided. Human research in contrast offered a conceptualised relationship between producer and technology as “augmenting practice and subtle incorporation in decision-making”. This theme was supported by a producer’s insight that “Our system weighs our cattle in the paddock all the time and we can see the condition of the country changing through the cattle before you can actually see it by the eye” (Richards et al., 2025, p. 59). This insight that producers use data generated by technology designed to monitor cattle to connect it to and make inferences on the health of very large and expansive pastoral land is an important conceptual indicator of relationality. It is also not a generic indicator but specifically derived from a producer’s lived experience.
The analysis performed by AI, accordingly, would from a reviewer’s perspective be considered as lacking an original conceptual contribution and requiring further human intervention. Building theory has been identified by literature as an area where AI lacks the ability to replace human cognitive inputs (e.g. Christou, 2023b, p. 2746) and this research has not found any evidence to the contrary. Nevertheless, leaving aside human-like theorisation, the research did find that AI picked up key themes and quotes and offered summaries that could be useful for preliminary coding or reviewing theoretical frameworks prior to data analysis. The detailed investigation of specific conceptual tasks and operations of theory development, as presented in McInnis (2011) or Colquitt and Zapata Phelan (2007), and the ability of AI models learn those tasks could be a fruitful direction of further research.
Discussion: Humans Working With Dis-Experienced AI in Qualitative Analysis
Earlier, we asked:
At present, AI may operate as a research assistant under close supervision, particularly in assisting with literature reviews, transcribing voice to text (although with many inaccuracies) and supporting spelling and grammar. The technological evolution, visible even within the short duration of our research, indicates that the capabilities of AI based research tools are improving fast and will almost certainly become inseparable from the work flows of normal research. Indeed, in relation to the qualitative research process (see Figure 1) there are AI tools available to support human researchers at every single stage, in terms of general discovery, literature review, designing interview questions, coding, and writing. Some universities endorse specific tools such as Semantic Scholar, Grammarly or Copilot, and beyond that, there is a vast and fast-growing selection of AI instruments available to researchers (e.g. Ithaka S+R, 2025) for their evaluation, including in terms of capability, usefulness, economy and ethics. This means that technological limitations currently experienced may soon be removed, especially in line with growing computational power, wider adoption, and a greater researcher skill base, which will all drive higher performance of research focused AI.
The core argument of our paper, however, concerns limitations related to epistemology that are not automatically dispelled by technological development. The findings reveal an uncertainty whether AI sufficiently explores and references
Another critical aspect in relation to the interpretation of empirical data are explicit linkages between empirical observations to the lived human experience inherent in the data. To highlight the central importance of the human experience in constructivist-interpretivist data analysis, we propose the recognition of “Human Experiential Content” (HEC) when considering the support of AI tools within different disciplines of qualitative research. HEC cannot be decoupled from the whole analytical process and serves as an indicator of in-depth engagement with human experience through researcher involvement in the data collection process and the human experience inherent in empirical data and their interpretation. To elaborate further, qualitative methodologies of high HEC are suited to comprehend lived experience in its natural context with a focus on authenticity and data depth rather than quantity and data breadth. As complete experiential authenticity is arguably not achievable through research (Sandelowski, 2002, p. 106), the concept of HEC suggests that qualitative epistemologies and methodologies contain variable levels of human experiential content in terms of data depth, focus on exhaustive and most relevant description of lived experience, and researcher involvement in data collection and interpretation. As such, publicly available, generic data or quantitative data still contain lived human experience at least as a residual property, however, their distance from empirical data and its inherent human experience can undermine the authenticity and experiential depth of the analytical process (Fossey et al., 2002, p. 729). At the same time, the low level of HEC could make the data more suited to AI based dis-experienced analysis.
This paper argues that the uncertainties related to the direct engagement with empirical data equate to an uncertainty of engagement with lived human experience. As such, it will become critical for social researchers to consider and reflect on their research methodology and objectives and provide justifications of the epistemological appropriateness of AI use for qualitative data analysis and theory building. Figure 2 provides a conceptual frame that indicates diminishing utility of AI based tools for highly interactive and experiential research: Epistemological Differences Based on Human Experiential Content (HEC). Source: Created by the authors
AI training data further differs from publicly available generic data not just due to its lower experiential content and greater distance between researcher and data, but it represents an entirely separate ‘computational ontology’ (Frank et al., 2024, p. 175) that is statistically derived from authentic and embodied human experience. As such, AI tools support general propositions about social reality based on a
Doubts exist within this research and the literature more broadly whether AI based tools are adequate to reliably comprehend human experience (Anis & French, 2023; Christou, 2023b). Analysis and conceptualisation of experiential data can be partially outsourced to AI within a methodology of human-AI collaboration (Lieder & Schäffer, 2024; Morgan, 2023) but require in-depth or holistic human reflection and justification (Bommasani et al., 2023). Furthermore, interactively constructed data offer a high degree of authenticity, credibility and trustworthiness (Crotty, 1998; Sandelowski, 2002; Shenton, 2004). If AI indeed fails to draw directly on relevant empirical data, as this research has experienced, and AI training data continuously dilutes the empirical base, the result would be an undifferentiable blend of human and artificial data in social life and research. As the distinction between AI generated and ‘real human data’ gets blurred, high levels of HEC may provide the only tangible basis of verification of AI generated research propositions. Without a coherent method of observational and conceptual verification of evidence and propositions based on lived and embodied human experience, nothing prevents social research from starting to “confuse words and images with things” (Sandelowski, 2002, p. 112). Future research may focus on evaluating human-AI generated outputs comparatively across research paradigms and methodologies that contain various levels of HEC with a focus on developing commensurate methodologies of verification.
Research on Large Language Modelling has indicated that continued injection of fresh human experience-based data is indispensable. Research has proposed that Large Language Models that rely on recursively fed AI generated data have shown significant decline of coherence and intelligibility, potentially resulting in an AI collapse or, at least, a vastly diminished usefulness for scientific research (Shumailov et al., 2024; Snoswell, 2024). As AI models apparently do not have the ability to self-perpetuate open learning without the supply of fresh human experiential data the human experience, for now, is bound to stay. However, it is the very nature of researchers to explore new grounds and new methodologies and AI ‘tinkering’ (see Higgins et al., 2023) and experimentation will continue.
Following insights from this in-depth experiment that AI requires skills in ‘coaxing’ and produces output based on a limited sub-section of data /or unrecognised external sources, researchers need to exercise caution when determining whether to use AI tools in their work. Our findings illustrate that AI whilst output may appear ‘analogous with greater efficiency’ compared to human analysis, it is embedded with hidden pitfalls, omissions and errors. These may only become apparent with extensive human intervention, which is counter-logical to using AI as an ‘efficiency tool’. To overcome this, researcher training must ensure the skills development to critically evaluate AI tools. The incorporation of AI technology into research will also have practical implications for research institutions, training, and publishing, for example in terms of new skill training curricula and research protocols and policies. Several institutions such as universities, governments and publishers have established guidelines for the use of AI in research.
Typically, these raise concerns around ethics, copyright and disclosure, but focus mostly on manuscripts, such as the use of Generative AI, or Large Language Models for manuscript preparation (see for example, Singapore Management University, 2024). Institutional guidelines for data analysis are more obscure and fragmented across various methods websites and the more recent academic papers (e.g. Morgan, 2023). To address this gap, we propose a three-step framework to guide researchers in applying their own HEC to establish rigour in practice where AI is used for data analysis in qualitative research. (1) (2) (3)
Conclusion
This paper aims to elevate understandings of human-AI collaboration within the continuum of qualitative data analysis, from descriptive coding and categorising, the development of themes and inductive/abductive theory development. With a focus on inter-coder reliability in human-AI data analysis, this novel multi-disciplinary qualitative experimental study draws attention to the epistemological implications of using ‘dis-experienced’ AI technology for human experiential research. The AI output revealed fundamental uncertainties related to the ability of AI tools to relate to specific empirical evidence, in terms of data generation, data boundaries, the clarity of empirical-conceptual indicator links, as well as the persistent tendency to default to training data. Despite this, guidelines for the use of AI in institutions have largely focused on the development of manuscripts, and clear author statements. Addressing the gap regarding clear guidelines, the authors have shared their observations regarding the need for responsible disclosure and sound justification of AI use within research, as well as outlining the method of verification of AI assisted research output.
A key shortfall of AI assisted analysis was found to relate to the necessary de-coupling of authentic human experience. Specifically, that the quality of qualitative research does not depend on an AI based ability to process larger data sets more quickly, but on a transparent understanding of how AI processes construct knowledge – a significant epistemological problem. In response, this paper has proposed the concept of “human experiential content (HEC)” to highlight and frame epistemological incongruences in the use of AI, especially in relation to constructivist-interpretive analysis and theorising. It has foregrounded the importance of human researcher reflexivity extending into the space of AI as a non-human actant that constructs ‘knowledge’ from dis-experienced algorithms. ‘Coaxing’ via ‘inverted induction’ emerged as an important strategy for the assertion of HEC in the human-AI collaboration, illustrating how AI use, rather than representing a plug-and-play solution, still fundamentally depends on human interaction and interpretation. However, the disposition of this interaction may change with non-familiar datasets where the potential richness to be mined is entirely unknown to the researcher, and where context and authenticity will require new methodologies of verification.
The epistemological findings were drawn from a qualitative experiment of human-AI collaboration, conducted at a specific point in time, which cannot fully account for the dynamic and exponential development of AI capabilities. The findings represent a specific perspective based on the experience of the researchers involved, not an ultimate verdict on the use of AI. Qualitative researchers will use AI according to their own knowledge, goals, skills and priorities and we acknowledge that other justified perspectives on AI use exist.
While the authors note the value of AI as a ‘research assistant’, this paper has highlighted the lack of epistemological confidence regarding the use of AI in qualitative data analysis. Our research institutions will have to reflect on evidence of AI related impacts and develop policies to safeguard foundational epistemological and ethical concepts of research in the age of AI, including reliability, authenticity and originality. In a world of increasingly indistinguishable provenance of data generation, creation and derivation, human experience will retain a critical epistemological role as
