Abstract
Technical Note
Over the past decade, data science methods has triggered a radical shift across various sectors, including but not limited to medicine and health care, business and finance, psychology and neuroscience physics and biology (Rahul et al., 2023; Sarker, 2021). Particularly, the integration of machine learning (ML) and artificial intelligence (AI) techniques into both basic and applied research has significantly broadened the transdisciplinary scope of data science. This integration has ushered in novel, multidimensional complexity models, which provide insights from machine and human that aimed at better understanding the underlying research problems (Sarker, 2021). In quantitative research, ML- and AI-powered tools have helped improve quality of evidence generation and facilitate more robust inference of findings. Evidence can be observed in the surge in quantitative studies that leverage on ML- and AI- techniques in recent years (Morande, 2022; Pena-Guerrero et al., 2021; Vinod & Prabaharan, 2020; Waller & Fawcett, 2013).
Conversely, the application of ML- and AI- techniques in qualitative remains underexplored and largely untapped (Longo, 2020). Nonetheless, the integration of ML- and AI- techniques can be very helpful for qualitative researchers. These techniques have demonstrated significantly faster qualitative coding capabilities for thematic analysis compared to human analysis, which is inherently slower and susceptible to bias (Towler et al., 2023). Various approaches for integrating ML- and AI techniques into qualitative thematic analysis are possible, including unsupervised machine learning techniques such as natural language processing, text mining, and apriori association rules, as well as supervised machine learning techniques such as artificial neural network, and transfer learning. The critical consideration lies in determining the appropriate juncture for integration within the qualitative data analysis framework. Insights from existing mixed methods study designs, can guide this integration. The integration can occur before (exploratory sequential integration), after (explanatory sequential integration), or during (convergent parallel integration) qualitative analysis, yielding hybrid qualitative-machine learning insights. We suggest visualizing such integration as portrayed in Figure 1. Research Designs Integrating Qualitative and Machine Learning Techniques. ML = Machine Learning; DL = Deep Learning; AI = Artificial Intelligence.
The primary challenge arising from this integration pertains to addressing ambiguous or conflicting ML-generated qualitative coding compared to human-generated coding. While methodological studies to guide researchers in this regards are limited, potential solutions may involve the intervention of one or more external examiners to resolve conflicting codes, although the debate persists regarding whether these examiners should be human or machine. Notwithstanding these challenges, it is evident that ML- and AI techniques have the potential to enhance the rigour and depth of findings by offering triangulation of text data analysis and generating more meaningful insights into the study phenomenon (Chen et al., 2018). It is crucial to be understood that the use of MI are not as a replacement for human especially in qualitative research but to be utilised as a tool in assisting qualitative researchers (Christou, 2023).
Example Case Study
Title
Understanding Public Sentiment and Behavioral Drivers During a Pandemic Response (See Figure 2 for Summary Workflow) Workflow Diagram Illustrating the Integration of Machine Learning Techniques at Pre-analysis, During-Analysis, and Post-analysis Stages in a Qualitative Study of Public Sentiment and Behavior During a Pandemic Response.
Context
A team of public health researchers aims to understand how the public perceives and responds to vaccination campaigns during a pandemic. They collect thousands of open-ended survey responses, social media comments, and transcribed interviews from a diverse population.
Research Objective
To identify prevailing themes, emotional drivers, and barriers influencing vaccination uptake, with the goal of informing policy and communication strategies.
Integration of both qualitative and machine learning techniques
Exploratory Sequential Integration
Researchers could use technique such as unsupervised ML — Natural Language Processing (NLP) and Topic Modeling (e.g., LDA) before formal human coding. The research team runs unsupervised topic modeling to cluster the data into latent themes (e.g., trust, fear, misinformation, access).
Outcome
At this stage, the researchers reveals unexpected clusters such as conspiracy-related discourse and confusion about eligibility criteria, prompting researchers to refine their qualitative inquiry framework and interview questions.
Convergent Parallel Integration
Researchers could use technique such as supervised ML — Transfer Learning using BERT + Manual Coding, a subset of data (e.g., 500 manually coded responses) is used to train a supervised classifier. A BERT-based model is then applied to the rest of the dataset to assist coding.
Conflict Resolution Strategy
When model-generated codes conflict with human codes, an external examiner (e.g., a senior qualitative researcher) reviews and adjudicates the discrepancies. Conflicts that reflect nuanced meaning (e.g., sarcasm or irony) are flagged for deeper interpretation.
Outcome
This hybrid process accelerates analysis while maintaining interpretive depth and ethical sensitivity. Codes with high agreement between human and machine are prioritized for rapid synthesis; ambiguous segments receive more nuanced review.
Explanatory Sequential Integration
Researchers could use technique such as Neural Network and Pattern Recognition. After the main thematic analysis is complete, a deep learning model is used to explore patterns across demographic groups, for example.
Outcome
The model identifies that younger respondents are more likely to express mistrust, while older respondents discuss logistical challenges. These insights lead to segmentation of recommendations by demographic group and inform targeted communication strategies for each group.
ML- and AI- techniques serve as useful tools for supporting qualitative researchers analysing large text datasets quickly and interpreting findings with enhanced trustworthiness, depth, and multidimensional perspectives, particularly the agreeableness of human-generated and machine-generated codes and themes (Chen et al., 2018; Towler et al., 2023). ML- and AI- techniques can highlight complex patterns and connections in data that can be missed by human researchers which may lead to groundbreaking insights and a deeper understanding of the research (Badrulhisham et al., 2024). This approach holds particular promise for supporting and enhancing existing grounded theories, conceptual models, and hypotheses, especially during time-sensitive scenarios such as public health emergencies like the COVID-19 pandemic (Chen et al., 2018; Towler et al., 2023). Ultimately, the integration of these techniques stands to elevate the level of evidence derived from qualitative studies, thereby enhancing their impact and relevance in informing decision- and policy- making.
