This study investigates the efficiency and thematic completeness of manual versus artificial intelligence (AI)-assisted qualitative analysis of nurses’ insights into the recovery process for patients who have had emergency abdominal surgery (EAS), specifically understanding barriers and facilitators to recovery. EAS is associated with significant mortality and complications, and nurses play a crucial role in patient recovery. Extracting actionable insights from qualitative data is labor-intensive, with traditional methods requiring extensive person-hours and being susceptible to inter-coder drift. Rapid qualitative analysis (RQA) offers a streamlined approach, and commercial large language models (LLMs) may accelerate and augment qualitative research, promising semi-automated data coding and synthesis while maintaining human oversight. This study systematically compared four analytic workflows: manual RQA, ChatGPT-o3 analysis of human-generated summary notes, Microsoft Copilot GPT-4 analysis of cleaned transcripts, and ChatGPT-o3 analysis of cleaned transcripts. Manual RQA took approximately 30 person-hours, while each of the LLM-assisted methods ranged from four to eight person-hours, demonstrating significant time savings. The LLM-generated codebooks captured nearly all concepts and were highly rated by nurse participants for thoroughness, action-orientation, and relevance to their roles. The findings suggest that a pragmatic hybrid approach, using human note-taking, AI analysis, and human review, offers an effective balance of speed, ethical AI use, and rigorous qualitative analysis for implementation science. While LLMs reduced labor and provided comprehensive thematic coverage, human input and analysis remains vital to support qualitative rigor.
Emergency abdominal surgery (EAS) carries significant morbidity and mortality associated with high major complication rates and worsening quality of life (Aggarwal et al., 2019; Lau et al., 2024; Tengberg et al., 2017). A 2024 systematic review showed that physical and psychosocial recovery is often incomplete for 3–12 months after EAS, underscoring the mismatch between a “successful” operation and lived postoperative outcomes (Khanderia et al., 2024). Best-practice paradigms for Enhanced Recovery After Surgery (ERAS) consistently assigns nurses a frontline role in education prior to surgery, recognizing complications early, needs-based tailored patient education, and documenting nursing-specific barriers to pathway adoption (Brown & Xhaja, 2018).
Digital health interventions offer a promising avenue to support patients after EAS. A recent Lancet Digital Health review found 126 digital interventions for postoperative monitoring. However, 80% of those interventions were still in development and “none had reached [large-scale] implementation” (McLean et al., 2023). This highlights a significant gap in real-world evaluation and underscores the need for robust implementation science solutions to translate these promising digital health tools into effective clinical practice for EAS patients.
A crucial foundational step in the digital health design-to-intervention pipeline, particularly within implementation science, is extracting actionable insights from qualitative data (Brown et al., 2025; Palinkas & Zatzick, 2019). Traditional methods for qualitative analysis include transcription, data cleaning, codebook creation, and thematic analysis (Brown et al., 2025). Traditional thematic analysis typically requires iterative line-by-line coding by multiple researchers.
In fast-moving healthcare environments, especially in the context of implementation science where adaptive interventions are key, timely qualitative findings are vital to achieving actionable results (Brown et al., 2025; Palinkas & Zatzick, 2019). Epistemologically, this study adopts a pragmatic stance. We are not seeking a single objective truth, nor are we engaging in deep constructivist meaning-making. Instead, our goal is utility: identifying actionable clinical barriers rapidly to improve patient care. To bridge the gap between evidence and practice, researchers have increasingly turned to accelerated approaches that preserve analytic rigor while reducing turnaround time (Vindrola-Padros & Johnson, 2020). While accelerated approaches such as Framework Analysis, Applied Thematic Mapping, or Agile Thematic Synthesis offer rigorous structures for analysis, they often entail resource-intensive cycles that can delay the translation of evidence into practice (Gale et al., 2013; Guest et al., 2012; Riley et al., 2013). RQA methods streamline or eliminate transcription; condense data management; and use structured templates, summary matrices, and/or direct audio processing to accelerate analysis (Brown et al., 2025). Such approaches are well-suited to real-world/implementation research, where timely feedback is critical to adapting interventions (Brown et al., 2025; Palinkas & Zatzick, 2019).
Newer LLMs can generate codes and synthesize themes at unprecedented speed, an advantage for implementation scientists who need rapid insights. However, the evidence base is still emerging. Benchmarks show wide variability in agreement with human-coded themes (38–85%) and highlight risks such as prompt sensitivity, hallucinated categories, biased outputs, and opaque decision-making (Castellanos et al., 2025; Chubb, 2023; Kondo et al., 2024; Morse et al., 2025; Sakaguchi et al., 2025; Wachinger et al., 2025). Without robust prompt engineering including iterative prompting and contextual scaffolding, LLMs can produce shallow or imprecise themes (Morse et al., 2025; Sakaguchi et al., 2025). Some studies omit quantitative agreement measures altogether or rely on custom metrics, limiting cross-study comparability (Brondani et al., 2024; Castellanos et al., 2025; Kondo et al., 2024; Sakaguchi et al., 2025; Wachinger et al., 2025). This indicates a need for tools to quantify agreement, allowing for true benchmark comparisons. Thus, there is a need for consistent benchmarking tools and deeper evaluation of when and how LLMs can support rigorous qualitative analysis in health informatics (Bergman et al., 2024). Together, these concerns point to the need for systematic evaluations of LLM performance, focusing on efficiency, thematic completeness, and ethical integrity in qualitative health informatics research.
To capture nurses’ insights into the recovery process for EAS patients, we conducted qualitative sessions designed to explore barriers and facilitators to optimal recovery. The rich data corpus yielded from these discussions served as a robust test-bed for evaluating various qualitative analysis approaches. This study systematically compared four distinct analytic workflows: a traditional manual rapid qualitative analysis and three innovative AI-assisted variants. We then analyzed the efficiency and thematic completeness of each approach.
2. Methods
2.1. Design, Setting, and Participants
To capture nurses’ insights into the recovery process for EAS patients, we conducted six 60-minute Zoom sessions (16 participants total) with registered nurses and/or nurse practitioners recruited by email from hospitals in California and Minnesota. All participants had experience caring for patients with abdominal surgery, with their scope of practice ranging from direct patient care to care coordination.
Semi-structured interview guides were used to explore barriers and facilitators to optimal EAS recovery both in-hospital and after discharge. Sessions were facilitated by an experienced moderator and recorded in Zoom. Institutional Review Board approval was obtained from both clinical sites and all participants provided verbal consent. Participants received a $100 incentive for participating in these Zoom interview sessions, and an additional $20 for completing the follow-up member-checking questionnaire.
2.2. Data Sources and Analytic Workflows
Each session produced two complementary data sources:
1. Summary notes: Free-text notes drafted by team members as they viewed the sessions.
2. Verbatim transcripts: Automated speech-to-text transcripts generated by Zoom.
The goal of this work is to compare the results of four pathways for analyzing the qualitative data from these sessions (Figure 1 and Table 1). Both the summary notes and verbatim transcripts served as the raw material for the manual RQA as well as the three AI-assisted workflows described in detail below. Additionally, AI was utilized to merge the resulting four baseline codebooks into a single reference codebook, which served as the basis for assessing thematic cross-coverage across the different workflows.
Comparison of manual and AI-assisted qualitative analysis workflows
Characteristics of Analytic Workflows
Workflow
Input data
Human effort
Large-language-model(LLM) involvement
Output
W1 – Manual RQA
Summary notes
Three team members
None
Manual codebook
W2 – ChatGPT-o3 + Summary notes
Summary notes
Three team members
ChatGPT-o3
LLM-generated codebook
W3 – Copilot GPT-4 + cleaned transcripts
Transcripts
One team member
Microsoft Copilot GPT-4
LLM-generated codebook
W4 – ChatGPT-o3 + cleaned transcripts
Transcripts
One team member
ChatGPT-o3
LLM-generated codebook
2.3. Manual Rapid Qualitative Analysis (W1)
Three team members (including two nurses), all with experience in qualitative research and analysis, reviewed the six audio recordings by listening in full and capturing detailed notes. Each team member built an initial list of barriers and facilitators emerging from the sessions they reviewed. The team members then met to reconcile terminology and clarify concepts within their own reviews. Guided by that discussion, each team member reorganized the material from their sessions into preliminary themes and subthemes. Further discussion resolved remaining discrepancies and produced the final codebook of agreed-upon themes, subthemes, and definitions that covered all sessions.
2.4. ChatGPT-o3 Analysis of Summary Notes (W2)
A researcher uploaded all deidentified summary notes (described above) as attached files into the ChatGPT-o3 interface (March 2025 release) with the prompt below. For simplicity, we used the term ‘focus groups’ throughout the prompts:
The attached documents contain raw notes from focus groups conducted with nurses who care for patients who have undergone emergency abdominal surgery. I want you to take on the role of a qualitative analysis expert, and conduct two tasks. We will be using this analysis to develop interventions to help these patients. The two tasks are as follows: 1. Organize the notes: Please merge and organize the notes. Format the notes for readability and output it as a new, structured file (.docx). 2. Codebook Creation: We intend to conduct a qualitative thematic analysis of the notes. Please analyze the notes to identify key themes and subthemes, ensuring that they capture the main topics discussed by the participants. Create a structured codebook that includes: a. A list of themes and subthemes b. Definitions for each theme and subtheme
We noticed that ChatGPT-o3 generated exactly the same number of subthemes within each theme, so we followed up with this prompt:
I see you put 4 subthemes for each theme. Reanalyze the notes and consider whether this is the right number. It would be fine to have more subthemes in some categories than others.
ChatGPT-o3 then generated a revised structured codebook.
2.5. Cleaning and Pre-processing of Transcripts (W3 and W4)
Institutional policy required that raw participant data be processed only with a university-approved LLM (Microsoft Copilot GPT-4). Because the University’s version of Copilot could not accept file uploads and enforced an 8,000-character limit, we chunked the deidentified text into ∼8,000-character segments and cleaned each transcript by submitting each ∼8,000-character segment into Copilot with the following prompt:
The text below contains raw transcripts from a focus group conducted with nurses who care for patients who have undergone emergency abdominal surgery. Please clean the transcripts by removing irrelevant content (e.g., timestamps, filler or nonsense words) while preserving the meaning and accuracy of the discussions. Format the cleaned transcript for readability.
<followed by the segment of transcript>
2.6. LLM Analysis of Cleaned Transcripts (W3 and W4)
After cleaning the transcripts, we fed each fully cleaned transcript into Microsoft Copilot GPT-4 (for W3) and ChatGPT-o3 (for W4) using the following prompt:
We intend to conduct a qualitative thematic analysis of the transcript. Please review all of the work we did and identify key themes and subthemes, ensuring that they capture the main topics discussed by the participants. Create a structured codebook that includes: A list of themes and subthemes; Definitions for each theme and subtheme
<followed by the fully cleaned transcript>
We then entered the following prompt into both Copilot and ChatGPT-o3:
Following are qualitative codebooks identified from each of 6 sessions with nurses caring for patients who had emergency abdominal surgery. Analyze these codebooks for overlap and create a single codebook of barriers to recovery that encompasses these themes and subthemes.
<followed by the codebooks>
Copilot GPT-4 and ChatGPT-o3 each generated a single codebook.
2.7. Evaluation of the Codebooks
We employed two approaches to evaluate the codebooks: 1) checking their coverage against a synthesized codebook, and 2) gathering feedback from nurse participants.
2.7.1. Creation of Synthesized Codebook
Once we had codebooks for our four workflows, we entered the following prompt into ChatGPT-o3:
I have 4 different codebooks created for these focus groups that I can provide to you. I would like you to analyze the alignment between them.
ChatGPT-o3 responded by asking for clarification on format, content, desired depth of comparison, output format, and any exclusions or special rules. Our responses specified:
I will paste them as text. They do contain definitions. I want a high-level map of which major themes and subthemes overlap in concept... A single consolidated table (themes × codebooks) plus a narrative summary work is great. I do also want recommendations for harmonizing or merging the four codebooks... Include positive-outcome codes for now. I want closely related codes be merged.
ChatGPT-o3 generated the synthesized codebook.
2.7.2. Nurse Participant Feedback on the Codebooks
We invited the 16 nurse participants to provide feedback on the four codebooks, with data collected and managed using the secure, web-based software platform REDCap (Research Electronic Data Capture) hosted at the University of Minnesota (Harris et al., 2009). The participants rated how well each codebook reflected their perspective as a nurse using a Likert scale (Not at all, Slightly, Somewhat, Mostly, and Completely). They also provided open-ended responses on what each codebook got right and wrong, an overall ranking of the codebooks, and what appealed to them about their top choice. Participants received a $20 incentive for completing the follow-up survey.
3. Results
The manual (human) rapid qualitative analysis (W1) took approximately 30 person-hours to complete and resulted in 3 themes and 10 subthemes, shown in Table 2.
Manual RQA (W1) Codebook
Theme
Sub-theme
Definition
1. Communication
1.1 Language & Literacy Barriers
Need for interpreters/translation and the challenges patients face understanding health information.
1.2 Miscommunication
Situations where providers and patients are “not on the same page;” asynchronous or unclear exchanges that negatively affect care.
1.3 Communication Tools
Technologies or resources (e.g., translation software, patient portals) that facilitate clearer information exchange.
1.4 Positive Communication
Examples of timely, accurate, and empathic dialogue that improves patient experience and outcomes.
2. Education
2.1 Education Continuum
Timing, sequencing, and method of delivering teaching across the care journey (pre-op → discharge → home).
2.2 Volume of Topics
The breadth and amount of content provided; risk of overwhelming patients or caregivers.
3. Care Coordination
3.1 People Facilitators
Key individuals (case managers, navigators, social workers) who organize and streamline care.
3.2 Service Facilitators
Home-health, outpatient rehab, transportation, or other formal services that support recovery.
3.3 Resource Constraints
Geographic distance, financial barriers, supply shortages, or follow-up limitations that hinder optimal care.
3.4 Care Complications
Medical or logistical problems that emerge during the episode of care and complicate recovery.
The ChatGPT-o3 analysis of notes (W2) took approximately 8 person-hours to complete and resulted in 10 themes and 47 subthemes, shown in Table 3.
ChatGPT o3 Analysis of Notes (W2) Codebook
Theme
Sub-theme
Definition
1. Patient Education & Health Literacy
1.1 Discharge instructions
Clarity, completeness, and readability (print + verbal) of instructions provided at hospital exit.
1.2 Pre-operative education
Presence/absence and adequacy of teaching before emergent surgery; how lack of prep affects anxiety and recovery.
1.3 Language & translation services
Access to professional interpreters and fully translated materials; gaps during nights/weekends or for rare dialects.
1.4 Health-literacy adaptation
Use of plain language, pictograms, staging information over time, and tailoring content to patients’ education level.
1.5 Teach-back & reinforcement
Techniques (phone call, bedside return-demo) used to confirm patient/caregiver understanding after initial teaching.
2. Care Coordination & Continuity
2.1 Early case-management involvement
Starting discharge and resource planning on or near admission (“Day 0”).
2.2 Multidisciplinary alignment
Consistency of information across surgical, nursing, ostomy, social-work, nutrition, PT/OT teams.
2.3 Team/role clarity for patients
How well patients know who their surgeon, resident, NP, and case manager are and who to contact.
2.4 Follow-up scheduling
Ensuring timely in-person or telehealth appointments and bundling multi-specialty visits for remote patients.
2.5 Cross-setting hand-offs
Quality of information transfer between hospital, home-health, TCU/SNF, and PCPs.
3. Support Systems & Social Resources
3.1 Family/caregiver engagement
Involving support persons in teaching, decision-making, and ongoing care activities.
3.2 Transportation access
Distance, availability of rides, payer-provided transport, and travel vouchers.
3.3 Home-health & TCU availability
Capacity shortages, rural gaps, and post-COVID workforce attrition affecting visiting services or transitional-care beds.
3.4 Insurance & financial barriers
Coverage denials or high co-pays for supplies, equipment, or placement; strategies patients use to cope.
3.5 Social-work & community referrals
Activation (or absence) of social work, chaplain, peer-support groups to address socioeconomic needs.
4. Pain & Symptom Management
4.1 Analgesia logistics
Overnight pain-team access, timing of medication administration, and prescription fulfilment hurdles.
Out-of-pocket purchases, rationing, or reuse when insurance limits coverage.
6. Mobility, Nutrition & Physical Recovery
6.1 Early ambulation routines
Frequency and documentation of walk sessions; staffing/time constraints.
6.2 PT/OT integration & home-barrier assessment
Timing of consults and evaluation of stairs, elevators, or assistive-device needs at the patient’s residence.
6.3 Diet progression & ileus prevention
Education on advancing diet, monitoring for bloating/obstruction, and physician alignment on diet orders.
6.4 Nutrition optimization
Early dietitian involvement, protein/calorie goals, supplements, and tube-feed guidance.
6.5 Sleep & fatigue management
Night-shift clustering of care, promoting rest, and impact of sleep deprivation on healing.
7. Emotional & Psychological Support
7.1 Adjusting to a new ostomy (“new normal”)
Emotional reactions (grief, disgust) and coping strategies for unexpected body changes.
7.2 Fear of readmission or perceived failure
Confidence-building, reframing complications as part of recovery, and self-efficacy encouragement.
7.3 Spiritual & cultural care
Chaplain visits, culturally concordant support, or rituals impacting acceptance and healing.
7.4 Mental-health referral & counseling
Screening for anxiety/depression/PTSD and connecting to behavioral-health resources.
8. System & Staffing Constraints
8.1 Nurse-patient ratios & workload
Impact of staffing on education time, ambulation, and response to patient needs.
8.2 Interpreter availability on off-shifts
Difficulty securing live interpreters for rare languages/dialects during nights/weekends.
8.3 Knowledge drain & resource cuts post-COVID
Loss of experienced educators, merged units, or eliminated patient-education centers.
8.4 Length-of-stay (LOS) reduction pressure
Institutional push for early discharge that outpaces patient readiness and resource coordination.
9. Technology-Enabled Communication
9.1 MyChart messaging & digital literacy
Patient ability to send photos/questions, expectations for response time, and staff workload.
9.2 After-hours triage lines & direct numbers
Clarity of whom to call when; differences between sites in responsiveness.
9.3 Telehealth follow-up & video visits
Usefulness for rural/immobile patients, barriers to adoption, and hardware/internet issues.
9.4 EHR vs. paper instruction consistency
Alignment (or mismatch) between electronic discharge summaries and paper packets.
9.5 Remote monitoring tools (photos, logs)
Patients submitting incision photos or output diaries and how the team reviews/acts on them.
10. Post-discharge Monitoring & Triggers
10.1 Red-flag symptom lists
Fever, bleeding, N/V, output extremes, or pain escalation that require immediate contact.
10.2 Follow-up phone calls
Next-day or 48-h nurse outreach to identify early complications and reinforce instructions.
10.3 Teach-back verification
Structured checks that patients can articulate warning signs and self-care steps.
10.4 Readmission patterns & preventable causes
Common reasons (dehydration, wound infection, uncontrolled pain) and lessons learned.
10.5 Symptom-tracking diaries or apps
Use of logs for output, pain scores, or fluid intake to guide early intervention.
The Copilot analysis of 6 sets of themes/subthemes (W3) took approximately 4 person-hours to complete and resulted in 7 themes and 19 subthemes, shown in Table 4.
Copilot Analysis of 6 Sets of Themes/Subthemes (W3) Codebook
Theme
Subtheme
Definition
1. Communication Barriers
1.1 Language Barriers
Challenges in communication arising from language differences that limit understanding between patients and providers.
1.2 Health Literacy
The patient’s ability to comprehend and act on health information delivered by the care team.
1.3 Follow-Up Communication
Ongoing information exchange between providers and patients after discharge.
2. Care Coordination Challenges
2.1 Discharge Planning
Problems organizing the patient’s transition from hospital to home, including scheduling and instructions.
2.2 Role of Case Managers
How case managers participate in planning and coordinating patient care from admission through discharge.
2.3 Multidisciplinary Team Involvement
Collaboration among surgeons, nurses, therapists, and other professionals to deliver comprehensive care.
2.4 Handoff and Care Coordination
Transfer of critical patient information between shifts and services to maintain continuity of care plans.
3. Patient Support System Barriers
3.1 Family & Caregiver Involvement
Contributions and limitations of family members or informal caregivers in the recovery process.
3.2 Professional Support
Participation of home-care nurses, therapists, or other healthcare professionals in patient recovery.
3.3 Transportation & Accessibility
The patient’s ability to reach clinics, pharmacies, or other resources after discharge.
4. Post-Discharge Challenges
4.1 Managing Equipment & Medications
Difficulties handling medical devices (e.g., drains) and adhering to medication regimens at home.
4.2 Recognizing & Responding to Complications
Patient capacity to identify warning signs and seek timely help for post-surgical problems.
4.3 Logistical Challenges
Practical issues such as arranging home-care visits, securing supplies, or coordinating services.
5. Emotional & Psychological Barriers
5.1 Emotional Support
Provision of counseling, chaplaincy, or mental-health resources to address anxiety, depression, or distress.
5.2 Family Involvement
Engagement of family members in emotional support and shared decision-making about care.
6. Financial & Insurance Barriers
6.1 Insurance & Financial Issues
Coverage gaps, co-pays, or other monetary constraints that hinder access to care and supplies.
6.2 Rural & Remote Challenges
Additional obstacles for patients in rural or remote areas, such as limited services or long travel distances.
7. Patient Adherence & Compliance
7.1 Adherence to Medical Recommendations
The patient’s consistency in following prescribed medications, activities, and self-care instructions.
7.2 Social & Environmental Factors
External influences such as housing, social support, work demands that affect a patient’s ability to adhere and recover.
The ChatGPT-o3 analysis of 6 sets of themes/subthemes (W4) took approximately 4 person-hours to complete and resulted in 6 themes and 23 subthemes, shown in Table 5.
ChatGPT-o3 Analysis of 6 Sets of Themes/Subthemes (W4) Codebook
Theme
Subtheme
Definition
1. Health Literacy & Patient Education
1.1 Pre-surgery Education & Expectations
Adequacy and timing of information given before surgery about procedures, risks, ostomies, pain, and expected recovery trajectory.
1.2 Discharge Instructions & Readiness
Clarity, completeness, and patient confidence in performing post-discharge care tasks at the moment of hospital release.
1.3 Post-surgery Self-management Education
Teaching on medication schedules, wound/ostomy care, equipment use (e.g., drains, compression devices) and safe activity progression after discharge.
1.4 Language & Cultural Barriers
Any mismatch between the patient’s preferred language/culture and clinical communication that limits understanding.
1.5 Recognizing Warning-Signs
Patient ability to identify complications (infection, leakage, escalating pain) and know when/how to seek help.
2. Communication & Care Coordination
2.1 Interdisciplinary Team Communication
Effectiveness of information exchange among surgeons, nurses, case-managers, pharmacists, physical therapists, etc.
2.2 Nurse–Patient Communication
Day-to-day dialogue quality, trust, and clarity between bedside nurses and patients.
2.3 Discharge Planning & Handoffs
Quality and consistency of plans, documentation and verbal handoffs from hospital to home/community services.
2.4 Follow-up Channels
Availability and usability of tools such as MyChart, phone triage, telehealth, and clear guidance for urgent versus non-urgent contact.
2.5 Care Navigation/Case Management
Access to and effectiveness of personnel who arrange services, supplies, appointments and insurance authorizations.
3. Pain & Symptom Control
3.1 Immediate Post-Operative Pain Control
Adequacy of analgesia and non-pharmacologic strategies on the surgical unit.
3.2 Ongoing Pain Management at Home
Patient access to prescriptions, clear taper plans, and alternatives once home.
3.3 Symptom Management & Comfort
Support for nausea, sleep disruption, bowel function, and wound discomfort that influence recovery speed.
4. Support Systems & Psychosocial Resources
4.1 Family & Informal Caregiver Involvement
Availability, preparedness, and burden of relatives/friends who assist with care.
4.2 Professional Home-Health & Community Services
Timeliness and adequacy of visiting-nurse, PT/OT, or social-service support.
4.3 Emotional & Psychological Support
Resources addressing anxiety, depression, body-image (e.g., new ostomy), and coping (e.g., chaplaincy, mental-health referral).
5. Access, Logistics & Social Determinants
5.1 Insurance & Financial Constraints
Coverage gaps, co-pays, authorization delays, and out-of-pocket costs for medications, supplies, or services.
5.2 Transportation & Geographic Barriers
Distance, lack of transport, or rural/remote location limiting in-person care or follow-up.
5.3 Equipment & Supply Availability
Delays or shortages in obtaining ostomy supplies, wound vacs, mobility aids, etc.
5.4 Home Healthcare Placement
Difficulty securing appropriate skilled-nursing or rehab placement when needed.
6. Patient Engagement & Adherence
6.1 Medication & Treatment Adherence
Consistency with prescribed meds, dressings, activity restrictions, and diet.
6.2 Follow-up Appointment Adherence
Ability and willingness to attend scheduled clinic visits, imaging, or lab draws.
6.3 Lifestyle & Environmental Factors
Housing stability, social support, and competing obligations that influence adherence and healing.
The ChatGPT-o3 synthesis of the 4 codebooks took approximately 0.25 person-hours to complete and resulted in 8 themes and 42 subthemes, shown in Table 6.
Synthesized Codebook
Theme
Subtheme
Definition
1. Communication & Health Literacy
1.1 Language & Literacy Barriers
Limited English proficiency, low reading level, or cultural nuance that blocks comprehension of health information.
1.2 Miscommunication
Asynchronous, unclear, or contradictory messages that create confusion or errors.
1.3 Effective Communication
Timely, accurate, empathic exchanges (verbal, written, or digital) that build understanding and trust.
1.4 Interpreter & Tech Tools
Professional interpreters, translation software, portals, or telehealth platforms used to bridge gaps.
1.5 Follow-up Channels
Phone triage, MyChart, secure text, and other pathways for post-discharge questions, with clear response expectations.
2. Patient Education & Self-Management
2.1 Pre-operative Education & Expectations
Information before surgery about risks, procedures, ostomies, pain control, recovery timeline, to reduce anxiety and set goals.
2.2 Discharge Instructions & Readiness
Clarity, completeness, and patient confidence in performing post-discharge tasks at the moment of hospital release.
2.3 Post-surgery Self-management Education
Teaching on meds, wound/ostomy care, equipment use, diet/activity progression, and daily routines at home.
2.4 Teach-back & Reinforcement
Return demonstration, phone calls, or other methods confirming understanding and refreshing key instructions.
2.5 Education Volume & Timing
Staging material to avoid overwhelming patients or caregivers; balancing breadth of topics with cognitive load.
2.6 Red-flag Symptom Recognition
Explicit teaching of warning signs (fever, bleeding, output changes) and when/how to seek help.
3. Care Coordination & Continuity
3.1 Case Management & Navigation
Personnel who arrange services, supplies, insurance approvals, and appointments across settings.
3.2 Multidisciplinary Team Alignment
Consistency of plans/messages across surgeons, nurses, PT/OT, nutrition, and social work.
3.3 Discharge Planning & Handoffs
Formal processes and documentation that transfer care safely from hospital to home/rehab.
3.4 Cross-setting Continuity
Quality of information flow between hospital, PCP, home-health, or transitional care units.
3.5 Role Clarity for Patients
Patient understanding of “who does what,” contact points, and escalation pathways.
3.6 Follow-up Scheduling & Bundling
Timely, coordinated in-person or telehealth appointments, especially bundling multi-specialty visits for remote patients.
4. Support Systems & Psychosocial Resources
4.1 Family & Informal Caregiver Involvement
Presence, preparedness, and burden of relatives/friends who assist with decision-making and daily care.
4.2 Professional Home-health & Community Services
Availability, timeliness, and adequacy of visiting nurses, therapists, or community programs.
4.3 Emotional & Psychological Support
Access to chaplaincy, counseling, peer groups, or mental-health professionals for anxiety, depression, or body-image concerns.
4.4 Social Work & Community Referrals
Activation or absence of social work and community resources for housing, food security, legal, or financial needs.
4.5 Spiritual & Cultural Care
Services or rituals that respect cultural beliefs and promote acceptance, coping, and healing.
5. Access, Financial & Logistical Constraints
5.1 Insurance & Financial Barriers
Coverage gaps, co-pays, prior-authorizations, and out-of-pocket costs that limit meds, supplies, or services.
5.2 Geographic & Transportation Barriers
Distance, travel costs, rural/remote location, or lack of transport impeding service access.
5.3 Resource/Supply Shortages
Institutional or community deficits in staff, beds, or equipment that delay or restrict care.
5.4 Post-acute Placement Availability
Difficulty securing skilled-nursing, rehab, or home-health placements due to capacity, eligibility, or funding.
6. Physical Recovery
6.1 Pain Management
Adequacy of analgesia plans, prescription fulfilment, non-pharmacologic strategies, and taper guidance.
6.2 Symptom Management & Escalation
Patient ability and pathways to address nausea, wound changes, GI symptoms, or fever promptly.
6.3 Early Mobility & Ambulation
Frequency and support for walking/physical therapy in hospital and at home; staffing or pain barriers.
6.4 Nutrition & Diet Progression
Guidance on advancing diet, meeting protein/calorie goals, and preventing ileus or malnutrition.
6.5 Sleep & Fatigue Management
Strategies to cluster care, promote rest, and mitigate the impact of sleep loss on healing.
6.6 Wound Healing & Scar Management
Education and monitoring of incision/wound sites, scar care, and expectations for healing timeline.
7. Patient Engagement & Adherence
7.1 Medication & Treatment Adherence
Consistency in taking meds, changing dressings, performing ostomy care, and following restrictions.
7.2 Appointment & Follow-up Adherence
Ability and willingness to attend scheduled visits, imaging, labs, or telehealth sessions.
7.3 Lifestyle & Environmental Factors
Housing stability, work demands, family obligations, or living conditions that affect recovery behaviors.
7.4 Positive Engagement & Self-efficacy
Patient confidence, motivation, and proactive behaviors that facilitate smoother recovery.
7.5 Monitoring & Symptom Tracking
Diaries, mobile apps, or remote monitoring used to capture outputs, pain scores, or wound photos for early intervention.
8. Equipment & Supplies
8.1 Ostomy Supplies & Ordering
Starter kits, enrollment with suppliers, reorder instructions, and troubleshooting common issues.
8.2 Wound Care Materials & VACs
Availability, insurance approval, and competence in negative-pressure therapy or dressing changes.
8.3 Specialized Devices (TPN, Tube Feeds)
Training, safety checks, and logistical arrangements for infusion pumps, feeding tubes, or parenteral nutrition.
8.4 Supply Delivery & Logistics
Timing, accuracy, and reliability of deliveries; handling rural addresses, holiday/weekend gaps, or damaged shipments.
8.5 Financial Coping for Supplies
Strategies patients use (coupon programs, reuse, rationing) when coverage limits quantity or type of supplies.
3.1. Thematic Coverage of Codebooks
Tables 7 and 8 illustrates the thematic coverage of each of the four analytic workflows against the synthesized codebook, comprising 8 themes and 42 subthemes. The number in parentheses next to each workflow indicates the total count of subthemes captured by that specific workflow. The ChatGPT-o3-Notes workflow (W2) demonstrated the highest thematic coverage (37/42 subthemes), followed by ChatGPT-o3 transcripts (W4; 34/42 subthemes), Copilot transcripts (W3; 20/42 subthemes), and Manual RQA (W1; 11/42 subthemes). It is notable that “Positive engagement & self-efficacy” (7.4) was an emergent subtheme introduced by ChatGPT-o3 during the synthesis process, which was not explicitly present in any of the initial four codebooks. ChatGPT-o3 explained that it generated this subtheme to capture implicit facilitator ideas related to patient confidence and proactive actions, thereby ensuring the harmonized codebook could represent both barriers and successes.
Percentage of Subthemes Identified by Each Workflow, Stratified by Main Theme
Theme
W1: Manual RQA
W2: ChatGPT-o3-Notes
W3: Copilot GPT-4-Transcripts
W4: ChatGPT-o3-Transcripts
Communication & Health Literacy
80.0%
80.0%
60.0%
80.0%
Patient Education & Self-Management
16.7%
100.0%
33.3%
66.7%
Care Coordination & Continuity
16.7%
100.0%
66.7%
66.7%
Support Systems & Psychosocial Resources
20.0%
100.0%
60.0%
60.0%
Access, Financial & Logistical Constraints
75.0%
100.0%
75.0%
100.0%
Physical Recovery
0.0%
100.0%
16.7%
100.0%
Patient Engagement & Adherence
0.0%
20.0%
40.0%
80.0%
Equipment & Supplies
20.0%
100.0%
40.0%
100.0%
Detailed Thematic Coverage of Codebooks
Harmonized codebook
Manual RQA (11/42)
ChatGPT-o3-Notes (37/42)
Copilot GPT-4-Transcripts (20/42)
ChatGPT-o3-Transcripts (34/42)
1 Communication & Health Literacy
1.1 Language & Literacy Barriers
✓
✓
✓
✓
1.2 Miscommunication
✓
✓
✓
✓
1.3 Effective Communication
✓
—
—
—
1.4 Interpreter & Tech Tools
✓
✓
—
✓
1.5 Follow-up Channels
—
✓
✓
✓
2 Patient Education & Self-Management
2.1 Pre-op Education & Expectations
—
✓
—
✓
2.2 Discharge Instructions & Readiness
—
✓
✓
✓
2.3 Post-surgery Self-management Education
—
✓
—
✓
2.4 Teach-back & Reinforcement
—
✓
—
—
2.5 Education Volume & Timing
✓
✓
—
—
2.6 Red-flag Symptom Recognition
—
✓
✓
✓
3 Care Coordination & Continuity
3.1 Case Management & Navigation
✓
✓
✓
✓
3.2 Multidisciplinary Alignment
—
✓
✓
✓
3.3 Discharge Planning & Handoffs
—
✓
✓
✓
3.4 Cross-setting Continuity
—
✓
✓
✓
3.5 Role Clarity for Patients
—
✓
—
—
3.6 Follow-up Scheduling & Bundling
—
✓
—
—
4 Support Systems & Psychosocial Resources
4.1 Family & Informal Caregiver Involvement
—
✓
✓
✓
4.2 Professional Home-health/Community
✓
✓
✓
✓
4.3 Emotional & Psychological Support
—
✓
✓
✓
4.4 Social Work & Community Referrals
—
✓
—
—
4.5 Spiritual & Cultural Care
—
✓
—
—
5 Access, Financial & Logistical Constraints
5.1 Insurance & Financial Barriers
✓
✓
✓
✓
5.2 Geographic & Transportation Barriers
✓
✓
✓
✓
5.3 Resource/Supply Shortages
✓
✓
✓
✓
5.4 Post-acute Placement Availability
—
✓
—
✓
6 Physical Recovery
6.1 Pain Management
—
✓
—
✓
6.2 Symptom Management & Escalation
—
✓
✓
✓
6.3 Early Mobility & Ambulation
—
✓
—
✓
6.4 Nutrition & Diet Progression
—
✓
—
✓
6.5 Sleep & Fatigue Management
—
✓
—
✓
6.6 Wound Healing & Scar Management
—
✓
—
✓
7 Patient Engagement & Adherence
7.1 Medication & Treatment Adherence
—
—
✓
✓
7.2 Appointment & Follow-up Adherence
—
—
—
✓
7.3 Lifestyle & Environmental Factors
—
—
✓
✓
7.4 Positive Engagement & Self-efficacy
—
—
—
—
7.5 Monitoring & Symptom Tracking
—
✓
—
✓
8 Equipment & Supplies
8.1 Ostomy Supplies & Ordering
—
✓
✓
✓
8.2 Wound Care Materials & VACs
—
✓
—
✓
8.3 Specialized Devices (TPN, Feeds)
—
✓
—
✓
8.4 Supply Delivery & Logistics
✓
✓
✓
✓
8.5 Financial Coping for Supplies
—
✓
—
✓
3.2. Nurse Participant Feedback on the Codebooks
Six of the 16 participants (37.5%) responded to the REDCap survey (Harris et al., 2009). Figure 2 presents participants’ rankings of the codebooks, with ChatGPT-o3-Notes (W2) being rated as the best match, followed by ChatGPT-o3 transcripts (W4), Copilot transcripts (W3), and Manual RQA (W1). Participants’ qualitative feedback on their top-ranked choices consistently highlighted the comprehensive and actionable nature of the AI-generated codebooks. Comments included: “Includes the steps that would improve our weaknesses,” “...addressing all potential barriers and what can be done to alleviate those barriers,” “The teaching and getting the patient prepared is what we strive for… ensuring that caregivers and family [are] available to learn in case the patient is unable,” “Better highlights communication barriers [and] offers more concrete details on barriers,” “Very thorough, accurately depicts [the] full picture of patient interventions for surgery and recovery,” and “It feels like it fits my role the most.”
Nurse rankings of codebooks
4. Discussion
This study rigorously compared the efficiency and thematic completeness of four distinct qualitative analysis workflows applied to a common corpus of 16 nurse interviews and mini-focus groups: a manual rapid qualitative analysis workflow and three LLM–assisted workflows. Our findings demonstrate substantial time savings with LLM integration while achieving high thematic coverage, corroborated by participant feedback.
Specifically, manual RQA (W1) required approximately 30 person-hours. In contrast, the LLM-assisted workflows significantly reduced labor: ChatGPT-o3 on researcher notes (W2) required ∼8 person-hours, and both Copilot on full transcripts (W3) and ChatGPT-o3 on full transcripts (W4) each took ∼4 hours. The higher time for transcript-based LLM analysis was primarily due to institutional technical limitations (file-upload and prompt-length limits) rather than methodological constraints. In an unrestricted environment, these workflows would likely take less than 30 minutes, highlighting an infrastructural rather than a methodological bottleneck.
Beyond efficiency, the workflows produced codebooks of varied granularity and completeness. While W1 resulted in 3 themes and 10 subthemes, W2 yielded 10 themes and 47 subthemes. Workflows W3 and W4 provided an intermediate level of detail. Crucially, the two transcript-based LLM workflows (W3 and W4) captured nearly all concepts present in our 8-theme, 42-subtheme harmonized framework, yet required approximately 85% less time than manual RQA (W1), even with the current institutional inefficiencies. This demonstrates that LLMs can not only accelerate analysis but also achieve comprehensive thematic coverage when prompted to generate codes. Nurse participant preferences (W2, W4, W3, W1) echoed the order shown in the thematic completeness analysis, with all LLM workflows preferred by nurse participants to the manual RQA.
The divergence in thematic completeness between the models may be attributable to specific architectural differences. ChatGPT-o3 utilizes advanced ‘chain-of-thought’ reasoning and operates with a significantly larger context window compared to the GPT-4 architecture powering the Copilot workflow. These capabilities allow for more robust data synthesis, enabling the model to retain and connect disparate narrative threads across lengthy transcripts, a critical advantage when analyzing the dense, unstructured accounts typical of qualitative interviews.
These results echo and extend a growing empirical literature on LLM-assisted qualitative analysis. One study demonstrated that ChatGPT generated complete inductive codebooks for health-services interviews in under 15 minutes, achieving high macro-level concordance with human gold standards (Bijker et al., 2024). Another reported a 70% reduction in analyst time and strong inter-rater agreement (Cohen’s κ between 0.72 and 0.95) when GPT-4 drafted codes (Liu et al., 2025). Our work corroborates these findings within a nursing context, showing that when LLMs are tasked with generating, not just assigning, codes, near-complete thematic coverage is achievable.
However, none of the workflows were without limitations. LLMs occasionally exhibited “hallucinations,” such as generating duplicate categories or imposing a fixed number of subthemes (e.g., four under every parent theme) even when the content did not warrant it. For instance, Copilot GPT-4 notably missed several details related to “Equipment and Supplies” that ChatGPT-o3 captured. These observations reinforce prior cautions about LLM “creativity” and underscore the critical need for robust human oversight and quality assurance at every stage of LLM-assisted qualitative analysis (Jiang et al., 2024).
Participant feedback on the codebooks provided further insights into what constitutes a “good” codebook from an end-user perspective. Of the 16 nurse participants invited, 6 (37.5%) completed the evaluation survey. While this sample size limits statistical generalizability and introduces potential response bias (where respondents may be those most engaged with the topic), the qualitative feedback provided offers critical preliminary validity checks from a content-expert perspective. Key takeaways from nurse participant preferences for the top-ranked codebooks (primarily LLM-generated) included:
Thoroughness: Participants consistently favored codebooks that provided a comprehensive overview, capturing the full spectrum of barriers patients face and the interventions that could follow.
Action-orientation: Codebooks that presented clear, actionable steps or solutions to identified weaknesses were highly valued.
Emphasis on Communication Barriers: Explicit highlighting of communication gaps among clinicians, patients, and caregivers was viewed favorably, reflecting a critical area in nursing practice.
Focus on Preparation & Teaching: Content emphasizing patient and family education and preparation for surgery and recovery resonated strongly, aligning with nurses’ daily priorities.
Role Fit: Several respondents noted that their preferred option “fits my role the most,” suggesting that the perceived applicability of themes in clinical practice significantly influenced their judgment of quality.
These participant preferences provide a framework for systematically characterizing the limitations of the less-performing AI workflows. By inverting the attributes most valued by the nurses, specifically thoroughness, action-orientation, and role fit, we can categorize the primary limitations of the LLM outputs as follows:
1. Lack of Granularity: Nurses consistently favored the comprehensive overview provided by ChatGPT-o3 (W4). This highlights a critical limitation in the GPT-4 Copilot workflow (W3), which, despite being efficient, produced thinner thematic descriptions that failed to capture the full spectrum of barriers identified in the manual analysis. Methodologically, this suggests that without advanced reasoning architectures, standard LLM summarization prioritizes brevity over the rich description necessary for rigorous qualitative inquiry.
2. Alignment with Purpose: The strong preference for actionable steps suggests that, without specific prompting to identify solutions, models may generate technically accurate but clinically passive themes that fail to resonate with nurses’ daily priorities. In implementation science contexts, this represents a significant risk, as the utility of qualitative evidence often lies in the operational details that models may smooth over.
3. Contextual Genericness: The feedback that the top choice fits their role the most implies that the lower-ranked outputs suffered from a lack of domain-specific resonance. In the context of EAS recovery, an AI output that misses the specific vocabulary of ‘patient education’ or ‘caregiver communication’ is not merely less detailed, it is perceived as having lower validity by the content experts. This underscores a broader transferability challenge: some AI models may default to normative language, struggling to capture the distinct professional dialects and tacit knowledge that define expert communities.
Regarding interpretive depth, we acknowledge that current LLMs prioritize breadth and speed (semantic analysis) over the nuanced, latent interpretation that sustained human immersion in the data yields. However, when the goal is rapid implementation rather than theory generation, this trade-off is acceptable. The AI successfully identifies the what (barriers) even if it occasionally misses the deeper why (latent meaning), satisfying the pragmatic requirements of the study.
Based on our findings, particularly the balance of efficiency and thematic completeness demonstrated by the LLM-assisted workflows and the positive participant feedback, we recommend a human-in-the-loop protocol for projects requiring rapid yet thorough qualitative insights, such as those in early design cycles or iterative implementation science:
1. Compile researcher notes: Team members review audio recordings by listening in full and capturing detailed notes.
2. Generate two parallel codebooks: Utilize a reflective LLM (e.g., ChatGPT-o3) to create one codebook from researcher notes and another from cleaned and deidentified raw transcripts. Our results indicate these two approaches offered the most comprehensive thematic coverage and were highly regarded by nurse participants.
3. LLM-assisted reconciliation: Prompt an LLM to reconcile and merge these parallel codebooks into a single, synthesized version.
4. Human consensus meeting: Conduct a dedicated team meeting to critically review the merged codebook, ensuring conceptual clarity, accuracy, and addressing any LLM-generated errors or omissions.
5. Member-checking: When feasible, member-check the final codebook with study participants to validate its resonance and applicability from their lived experience.
This proposed protocol preserves the crucial cognitive benefits of team members’ immersion in the data (taking notes) while strategically outsourcing the most time-consuming mechanical steps to LLMs. These findings are particularly beneficial for implementation scientists who often require faster data translation within resource-constrained environments.
However, ethical considerations in using LLMs for qualitative analysis remain paramount. Researchers must prioritize alignment with institutional policies regarding the protection of participant data. Awareness of these privacy protections is crucial, particularly for compliance with privacy laws and/or policies. While these workflows are recommended for rapid data translation with large datasets, especially within implementation science, LLM outputs must always be cross-referenced against the original data to maintain analytic rigor and mitigate the risk of hallucinations. The integration of LLMs into qualitative workflows necessitates a re-evaluation of data governance and interpretive responsibility. While LLMs can efficiently synthesize patterns, they lack the agency to assume accountability for clinical insights. We posit that the ‘Human-in-the-Loop’ safeguards utilized in this study are not merely quality assurance mechanisms but fundamental ethical requirements. When delegating thematic synthesis to AI, the final accountability for the validity of findings must remain solely with the human research team. Researchers must actively audit AI outputs to ensure that efficiency gains do not come at the cost of erasing minority perspectives or propagating hallucinated clinical associations that could impact patient safety.
This workflow necessitates a re-conceptualization of reflexivity. In AI-assisted qualitative research, the researcher’s role shifts from primary analyst to expert auditor. Bias does not disappear; rather, it shifts upstream to the prompt engineering phase and downstream to the validation phase. Reflexivity therefore requires the researcher to critically examine not just their interpretation of the data, but how their prompts may have constrained or directed the model’s outputs.
Additionally, we acknowledge that the specific models evaluated (ChatGPT-o3 and GPT-4) are already becoming obsolete relative to emerging architectures. However, the methodological implication holds: as models evolve to generate richer outputs, the risk of plausible-sounding hallucinations may increase. Future qualitative methodology must therefore shift focus from mere coding efficiency to rigorous interpretive auditing, ensuring that as AI tools become more capable, human oversight evolves to detect increasingly sophisticated errors.
Finally, it is important to acknowledge that numerous forms of qualitative analysis and theoretical approaches are better suited for human-only analysis, particularly when the researcher’s embodied role and deep contextual understanding are central to the inquiry.
5. Conclusion
This study provides compelling evidence that integrating LLMs into qualitative data analysis significantly reduces labor while achieving comprehensive thematic coverage of nurse-identified recovery barriers. Specifically, LLM assistance captured virtually all concepts present in a synthesized codebook, far surpassing the thematic breadth of manual RQA in a fraction of the time. Yet, our findings also underscore that human reconciliation remains essential to validate LLM outputs, correct duplications and omissions, and ultimately uphold qualitative rigor. We propose a pragmatic hybrid approach, involving dual LLM analyses of human notes and transcripts, a human review session and, ideally, member-checking, as an optimal strategy. Beyond the specific context of EAS, this study highlights the transferability of AI-assisted RQA to broader qualitative inquiry. The structural components of our hybrid workflow, including the defined prompt architecture, the use of summary notes for velocity, and the ‘human-in-the-loop’ validation protocols are domain-independent and readily adaptable to other applied fields such as public health, education, or organizational science. However, the interpretive accuracy remains domain-dependent. As evidenced by the nuanced differences in model performance, the effectiveness of the human check relies on content expertise. Researchers applying this method elsewhere must therefore ensure that the human auditor possesses deep subject-matter knowledge to detect the specific, plausible-sounding hallucinations that generic models may generate within specialized domains. This protocol offers a powerful harmonization of analytical speed, ethical LLM utilization, and the generation of translatable insights crucial for advancing implementation science.
Footnotes
ORCID iDs
Jenna L. Marquard
Christie L. Martin
Amanda C. Trofholz
Elizabeth C. Wick
Genevieve B. Melton
Ethical Considerations
This study was approved by the University of Minnesota and University of California, San Francisco Institutional Review Boards.
Author Contributions
Conception and Design (JM, LW, GM), Data Collection (JM, AT), Data Analysis (JM, CM, ST, AF), Drafting Manuscript (JM, CM, ST, AF), Critical Revision of the Manuscript (JM, CM, ST, AF, AT, EW, GM)
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by an award from the Agency for Healthcare Quality and Research (AHRQ) R18 HS029616 and by the University of Minnesota Center for Learning Health System Sciences (CLHSS), a collaboration between the Medical School and School of Public Health. The content is solely the responsibility of the authors and does not represent the official views of the AHRQ or CLHSS.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Deidentified study data will be made available upon reasonable request.*
References
1.
AggarwalG.PedenC. J.MohammedM. A.PullyblankA.WilliamsB.StephensT.KellettS.Kirkby-BottJ.QuineyN.for the Emergency Laparotomy Collaborative. (2019). Evaluation of the Collaborative Use of an Evidence-Based Care Bundle in Emergency Laparotomy. JAMA Surgery, 154(5), Article e190145. https://doi.org/10.1001/jamasurg.2019.0145
2.
BanoM.ZowghiD.WhittleJ. (2024). AI and Human Reasoning: Qualitative Research in the Age of Large Language Models. AI Ethics Journal, 4(2), 1. https://doi.org/10.47289/AIEJ20240122
3.
BergmanA. J.McNabbK. C.RelfM. V.DredzeM. H. (2024). “Where No One Has Gone Before”: Questions to Ensure the Ethical, Rigorous, and Thoughtful Application of Artificial Intelligence in the Analysis of HIV Research. Journal of the Association of Nurses in AIDS Care, 35(5), 450–455. https://doi.org/10.1097/JNC.0000000000000483
4.
BijkerR.MerkourisS. S.DowlingN. A.RoddaS. N. (2024). ChatGPT for Automated Qualitative Research: Content Analysis. Journal of Medical Internet Research, 26, e59050. https://doi.org/10.2196/59050
5.
BrondaniM.AlvesC.RibeiroC.BragaM. M.GarciaR. C. M.ArdenghiT.PattanapornK. (2024). Artificial intelligence, ChatGPT, and dental education: Implications for reflective assignments and qualitative research. Journal of Dental Education, 88(12), 1671–1680. https://doi.org/10.1002/jdd.13663
6.
BrownD.XhajaA. (2018). Nursing Perspectives on Enhanced Recovery After Surgery. Surgical Clinics of North America, 98(6), 1211–1221. https://doi.org/10.1016/j.suc.2018.07.008
7.
BrownR.Cigarroa KennedyS.Carranco ChávezE.Dumeng-RodriguezJ.CullenD. (2025). Evaluation of a notes-based rapid qualitative analysis method to facilitate implementation. Implementation Science Communications, 6(1), 23. https://doi.org/10.1186/s43058-025-00709-w
8.
CastellanosA.JiangH.GomesP.Vander MeerD.CastilloA. (2025). Large Language Models for Thematic Summarization in Qualitative Health Care Research: Comparative Analysis of Model and Human Performance. JMIR AI, 4, Articlee64447. https://doi.org/10.2196/64447
9.
ChubbL. A. (2023). Me and the Machines: Possibilities and Pitfalls of Using Artificial Intelligence for Qualitative Data Analysis. International Journal of Qualitative Methods, 22, Article16094069231193593. https://doi.org/10.1177/16094069231193593
10.
De PaoliS. (2024). Performing an Inductive Thematic Analysis of Semi-Structured Interviews With a Large Language Model: An Exploration and Provocation on the Limits of the Approach. Social Science Computer Review, 42(4), 997–1019. https://doi.org/10.1177/08944393231220483
11.
FeustonJ. L.BrubakerJ. R. (2021). Putting Tools in Their Place: The Role of Time and Perspective in Human-AI Collaboration for Qualitative Analysis. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2), 1–25. https://doi.org/10.1145/3479856
12.
GaleN. K.HeathG.CameronE.RashidS.RedwoodS. (2013). Using the framework method for the analysis of qualitative data in multi-disciplinary health research. BMC medical research methodology, 13, 117. https://doi.org/10.1186/1471-2288-13-117
HarrisP. A.TaylorR.ThielkeR.PayneJ.GonzalezN.CondeJ. G. (2009). Research electronic data capture (REDCap) – A metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform, 42(2), 377–381. https://doi.org/10.1016/j.jbi.2008.08.010
15.
HitchD. (2024). Artificial Intelligence Augmented Qualitative Analysis: The Way of the Future?Qualitative Health Research, 34(7), 595–606. https://doi.org/10.1177/10497323231217392
16.
JiangX.TianY.HuaF.XuC.WangY.GuoJ. (2024). A Survey on Large Language Model Hallucination via a Creativity Perspective (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2402.06647
17.
KhanderiaE.AggarwalR.BourasG.PatelV. (2024). Quality of life after emergency laparotomy: A systematic review. BMC Surgery, 24(1), 73. https://doi.org/10.1186/s12893-024-02337-y
18.
KondoT.MiyachiJ.JönssonA.NishigoriH. (2024). A mixed-methods study comparing human-led and ChatGPT-driven qualitative analysis in medical education research (No. 4). Nagoya University Graduate School of Medicine, School of Medicine. https://doi.org/10.18999/nagjms.86.4.620
19.
LauJ. W. L.BaligaJ.KhanF.TeoY. X.YeoJ. M. J.YeowV. Z.WuC. X.TeoS.GohT. J. H.IauP.Emergency Laparotomy Group. (2024). Perioperative emergency laparotomy pathway for patients undergoing emergency laparotomy: A propensity score matched study. Annals of the Academy of Medicine, Singapore, 53(12), 713–723. https://doi.org/10.47102/annals-acadmedsg.2024311
20.
LiuX.ZambranoA. F.BakerR. S.BaranyA.OcumpaughJ.ZhangJ.PankiewiczM.NasiarN.WeiZ. (2025). Qualitative Coding with GPT-4: Where it Works Better. Journal of Learning Analytics, 12(1), 169–185. https://doi.org/10.18608/jla.2025.8575
21.
MarshallD. T.NaffD. B. (2024). The Ethics of Using Artificial Intelligence in Qualitative Research. Journal of Empirical Research on Human Research Ethics, 19(3), 92–102. https://doi.org/10.1177/15562646241262659
22.
McLeanK. A.KnightS. R.DiehlT. M.VargheseC.NgN.PotterM. A.ZafarS. N.BouamraneM.-M.HarrisonE. M. (2023). Readiness for implementation of novel digital health interventions for postoperative monitoring: A systematic review and clinical innovation network analysis. The Lancet Digital Health, 5(5), e295–e315. https://doi.org/10.1016/S2589-7500(23)00026-2
23.
MorganD. L. (2023). Exploring the Use of Artificial Intelligence for Qualitative Data Analysis: The Case of ChatGPT. International Journal of Qualitative Methods, 22, Article16094069231211248. https://doi.org/10.1177/16094069231211248
24.
MorseE.LiA.AlbertS.HarpelL.RameauA. (2025). Large Language Model Versus Human‐Generated Thematic Analysis in Otolaryngology Qualitative Research. The Laryngoscope, 135(1), 1–2. https://doi.org/10.1002/lary.31894
25.
PalinkasL. A.ZatzickD. (2019). Rapid Assessment Procedure Informed Clinical Ethnography (RAPICE) in Pragmatic Clinical Trials of Mental Health Services Implementation: Methods and Applied Case Study. Administration and Policy in Mental Health and Mental Health Services Research, 46(2), 255–270. https://doi.org/10.1007/s10488-018-0909-3
26.
PerkinsM.RoeJ. (2024). The use of Generative AI in qualitative analysis: Inductive thematic analysis with ChatGPT. Journal of Applied Learning & Teaching, 7(1), 390. https://doi.org/10.37074/jalt.2024.7.1.22
27.
RileyW. T.GlasgowR. E.EtheredgeL.AbernethyA. P. (2013). Rapid, responsive, relevant (R3) research: a call for a rapid learning health research enterprise. Clinical and translational medicine, 2(1), 10. https://doi.org/10.1186/2001-1326-2-10
28.
Rodrigues Dos AnjosJ.De SouzaM. G.Serrano De Andrade NetoA.Campello De SouzaB. (2024). An analysis of the generative AI use as analyst in qualitative research in science education. Revista Pesquisa Qualitativa, 12(30), 01–29. https://doi.org/10.33361/rpq.2024.v.12.n.30.724
29.
SakaguchiK.SakamaR.WatariT. (2025). Evaluating ChatGPT in Qualitative Thematic Analysis With Human Researchers in the Japanese Clinical Context and Its Cultural Interpretation Challenges: Comparative Qualitative Study. Journal of Medical Internet Research, 27, Articlee71521. https://doi.org/10.2196/71521
30.
TengbergL. T.Bay-NielsenM.BisgaardT.CihoricM.LauritsenM. L.FossN. B.OrbækJ.VeyheL.Jørgen NielsenH.LindgaardL.AHA, study group (2017). Multidisciplinary perioperative protocol in patients undergoing acute high-risk abdominal surgery. British Journal of Surgery, 104(4), 463–471. https://doi.org/10.1002/bjs.10427
31.
Vindrola-PadrosC.JohnsonG. A. (2020). Rapid Techniques in Qualitative Research: A Critical Review of the Literature. Qualitative health research, 30(10), 1596–1604. https://doi.org/10.1177/1049732320921835
32.
WachingerJ.BärnighausenK.SchäferL. N.ScottK.McMahonS. A. (2025). Prompts, Pearls, Imperfections: Comparing ChatGPT and a Human Researcher in Qualitative Data Analysis. Qualitative Health Research, 35(9), 951–966. https://doi.org/10.1177/10497323241244669
33.
XiaoZ.YuanX.LiaoQ. V.AbdelghaniR.OudeyerP.-Y. (2023). Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding. In 28th International Conference on Intelligent User Interfaces (pp. 75–78). https://doi.org/10.1145/3581754.3584136