Abstract
Introduction
The electronic health record (EHR) is a significant driver of physician burnout.1,2 Primary care physicians spend a substantial amount of time on EHR tasks. 3 Medical scribes, healthcare providers who are either present in the examination room or are remotely listening to the interaction to document the encounter, can reduce EHR documentation burden, 4 but they are expensive. Artificial intelligence (AI) based ambient listening technology using generative AI can perform scribing functions. 5 The AI scribes are powered by natural language processing algorithms and can create medical notes based on audio or textual inputs during medical encounters. 6 Depending on the scribe technology, some are designed to capture and transcribe voice, while others require a digital device to record, such as a smartphone with a downloaded application. The technology transcribes and summarizes conversations toward the creation of a note, and then directly integrates the note into the EHR with limited human intervention.7–10 The AI scribe systems are capable of understanding medical terminology and can contribute to delivering efficient and accurate care.11,12
As AI scribe technology continues to develop, it is increasingly being utilized in healthcare systems. One study, using simulated encounters, found that this technology improved documentation quality and operational efficiency. 13 In another study, surveyed physicians believed the technology increased productivity and positively impacted well-being. 14 Using large language model-powered AI scribes, two pilot studies with less than 50 physicians found reductions in time per note, daily documentation, total EHR time, and improvements in burnout and physician task load.15,16 The most prominent analyses of AI utilization thus far found decreases in time in notes per appointment, mental demand, burnout, effort to accomplish note writing, and improvements in perceived well-being.17,18
Several systematic reviews evaluated the use of AI scribe technology for factors such as effectiveness, clinician experience, clinician burnout, efficiency, and engagement.6,19,20 Generally, AI scribes demonstrated positive results, such as faster documentation, reduced administrative burden, user-friendly, and enhanced patient–provider interaction.6,19,20 However, questions remain about the effectiveness of scribe training and the quality of documentation.6,19,21 A systematic review specifically evaluating errors found rates of over 50%, particularly during conversational or multispeaker scenarios. 21
As AI scribe technology continues to become widespread, it is important to recognize the workflow effects of the tool in a real-world setting for future implementation. Clinicians’ preferences toward completing clinical documentation can greatly differ. Some complete their note during the consultation, while others wait for the end of their clinic day. We conducted a mixed-methods analysis to understand the habits of clinicians using AI scribe technology and its impact on perceptions of burnout, mental demand, effect on characters typed, chart closures, and time spent on notes.
Methods
This project took place at Cleveland Clinic, a nonprofit academic health system. Participants were primary care physicians in Ohio and Florida who worked at least 0.6 full-time equivalent and were in the top 20% of EHR time outside scheduled hours (TOSH). The TOSH consists of any administrative work occurring in the EHR, but clinical documentation is typically the main driver. Primary care physicians spend more time on the EHR than their specialty peers. 22 We focused on the top 20% because we wanted to learn whether the tool could impact the most burdened group of physicians. Physicians were identified based on TOSH percentages calculated from the EHR across all physicians in family medicine and internal medicine. Potential participants were contacted via email and received an honorarium. Physicians informed patients that they would be using their smartphones to record the discussion as a method of improving their notes. If patients objected, then they did not record the session.
In February of 2024, enrolled participants completed a baseline survey, followed by training and installation of the HIPAA-compliant platform on their institution smartphones. The tool used in this project was a commercial AI scribe. Participants activated the tool at the beginning of outpatient appointments, using a computer or phone's microphone to listen and generate a note within seconds postappointment.
Statistical analysis
We collected physicians’ demographics and the following measures at baseline and 4 weeks: Mini Z 2.0 for burnout; NASA Task Load Index (NASA TLX) to assess the perceived burden of the clinic session; and a question about patient interaction (“In the last two weeks, EHR documentation makes it hard for me to pay undivided attention to my patients during face-to-face visits.”). A likelihood to recommend the tool (Net Promoter Score)
23
was also administered at 4 weeks. In addition, we collected TOSH, same-day chart closures, note length, and characters typed from EPIC Signal data. We compared pre- and postmeasures for all participants using paired
Postintervention, we conducted semistructured interviews with each physician. An interview guide was developed (Table 1) focusing on the physician's overall experience with the tool, as well as perceptions of its impact on workload and interactions with patients. All interviews were audio-recorded and transcribed. We used an inductive-deductive approach 24 to analyze data based on Rogers’ Diffusion of Innovation theory. 25 Considering Rogers’ five main factors that influence the adoption of an innovation (relative advantage, compatibility, complexity, trial-ability, and observability), we inductively examined transcripts independently to generate initial coding categories. Using the identified codes, the research team discussed and shared codes that aligned with the five factors of adoption toward the creation of a comprehensive codebook. Transcripts were reviewed to identify emerging insights. 26 Next, the research team reread all of the transcripts, paying close attention to relationships among the five categories, and developed potential themes. Preliminary themes were discussed and scrutinized by the research team, and interviews continued until thematic saturation was reached. The reliability of the analysis was confirmed through the research team's reflexive process of continuous self-examination, 27 as well as using negative case analysis, by finding evidence to contradict interpretations. 28
Sample questions from the semistructured interview guide.
Quantitative and qualitative analyses were completed concurrently, 29 and then mixed at the interpretation stage. This design provided triangulation by seeking the convergence of findings from separate methodologies. 30 The project was categorized as a quality improvement and did not undergo formal IRB review, and therefore, written informed consent was not obtained. Particiants’ data were confidential and not shared with anyone outside of the research team.
Results
Forty physicians responded to the invitation, and 10 ultimately enrolled. The mean age was 52 years, 70% were female, 70% White, and 10% Hispanic. Half were internal medicine and half were family medicine. One interview was not recorded due to technical issues but a memorandum was written by the interviewer, summarizing the discussion.
Physicians used the tool for a median of 61 encounters (IQR 9.5–75). Those with at least 61 encounters were grouped in the high utilization category. Half of the participants were categorized as high utilizers. The mean age was 54 years for high utilizers and 49 years for low utilizers. All (100%) high utilizers were female, and 60% were White, while 60% of low utilizers were female, and 80% were White.
The mean likelihood to recommend was 6.5/10. The only statistically significant pre-post change in any measure was characters typed (decreased 15,398;

Changes in electronic health record (EHR) activity among high and low utilizers of the artificial intelligence (AI) scribe.
Pre/post EHR measures among high and low utilizers of the AI scribe tool.
No measures assessing burnout and task load were statistically significant among all participants. However, burnout levels slightly decreased among all participants, and so did mental demand, physical demand, and feeling hurried. The perception that the EHR makes it hard to pay undivided attention to patients slightly increased, but it was also not statistically significant. Compared to <61 encounters (low utilizers), those with ≥61 encounters (high utilizers) reported reduced physical (
Pre/post survey measures of physicians’ perceptions using the AI scribe tool among high and low utilizers.
^ 1 = poor, 5 = optimal.
^^1 = excessive, 5 = minimum.
^^^ 1 = agree strongly; 5 = strongly disagree.
* 0 = very low demand, 100 = very high demand.
** 1= Strongly agree that EMR makes it hard to pay undivided attention to patient; 5 = Strongly disagree.
The qualitative analysis discovered facilitators and barriers encountered related to each factor that influenced adoption.
Relative advantage: perceptions about whether the innovation is better than the idea it supersedes
Facilitators
Physicians were optimistic about the tool and hopeful that it could alleviate burnout. They realized its potential, which motivated them to participate in the QI project. After using the tool, a physician who was contemplating retirement due to exhaustion stated, “
Barriers
Enthusiasm for the tool waned when physicians were confronted with obstacles. Physicians had difficulties preparing their phones in front of patients, and delays occurred because the tool was not yet integrated with the EHR (Epic). For instance, a physician said, “
Compatibility: existing values, past experiences, and needs of adopters
Facilitators
Physicians found the tool very accurate, giving them the impression that charts were closed faster. A physician said, “
Barriers
A source of dissatisfaction was how the tool organized the note, which differed from the physician's structure. In addition, the tool was not able to differentiate and highlight important issues from less urgent ones. A physician observed, “
Complexity: perceptions of the innovation’s ease of use and understanding
Facilitators
Every physician (100%) considered themselves tech-savvy and did not encounter any major technical issues with the tool. After watching a short training video, physicians felt comfortable using it.
Barriers
The scribing tool worked well for patients with acute issues, but complex patients with numerous issues were problematic. A physician said, “
Trial-ability: experimentation with the innovation on a limited basis
Facilitators
Continued usage of the tool made learning easier. After only a few days, physicians felt very comfortable using it. They also acknowledged that the tool adapts to their style and can “figure out” what is needed the more it is used.
Barriers
Several logistical issues were detected by physicians through the use of the tool. The AI tool had difficulty identifying the speaker when patients were accompanied by caregivers. In addition, although the tool was very accurate, it captured everything from the consultation, which was sometimes extraneous. Physicians found it necessary to remove aspects that did not need to be included in notes and make edits, which was considered “extra work.”
Observability: results visible to the adopters
Facilitators
Physicians noted how patients were extremely impressed by the technology and appreciated efforts to enhance patient–provider engagement. Although physicians had to explain why their smartphones were present during the encounter, hardly any patients protested and instead were enthusiastic about the experience.
Barriers
Overall, the tool did not feel like a significant time-saver. Due to the time demands of learning how to use a new system and the lack of integration into the EHR, any efficiencies gained were neutralized. In addition, physicians acknowledged that work occurring after hours was usually due to patient messages rather than closing charts.
Discussion
In this small quality improvement pilot of a new AI-based ambient listening technology, analysis of the entire data set revealed that the number of typed characters declined significantly, but there were no changes in any other outcome measure following implementation. However, after dividing physicians into high and low utilizers, we found that high utilizers experienced decreases in mental and physical demand with the tool. Nevertheless, there was no impact on burnout, same-day chart closures, or TOSH. Combined with the qualitative results, which found barriers related to struggling with learning a new system and orienting to new processes, it is reasonable to conclude that the tool did not greatly impact physicians after only one month of usage.
Mental strain is an important factor in understanding the effects of using a new technology. 31 Our analysis focused on physicians in the top 20% of EHR TOSH, highlighting how a potential solution to lessen workload can lead to temporary increases in workload. Examples of successful implementation of new technologies in healthcare services often rely on opinion leaders, change agents, or trusted individuals, otherwise referred to as champions.32,33 Generally, healthcare organizations struggle to successfully integrate new technology. Adoption is typically slow 34 and initially resisted by stakeholders. 35 Identifying barriers and appointing champions within each team or department to lead implementation efforts can empower healthcare providers to develop best practices. 32 Trust, comfort, acceptability, and usefulness are other important factors that contribute to successful implementation. 36 We found that those who used the tool more often benefited, which may have associations with trust and comfort. Our findings that note length increased align with a small pilot among dermatologists, in which their note length increased by 50 words. 37
Participating physicians in our study found the tool very accurate, but accuracy was a barrier in a 2025 qualitative study that assessed physicians’ perspectives on AI ambient scribes. 38 Interviews with 22 physicians at a healthcare organization in California identified barriers, such as limited functionality among non-English speaking patients, and a lack of access for physicians without a specific device. Another qualitative study that performed interviews with physicians about AI scribes found a positive impact on work–life balance and patient engagement, but identified barriers included use with non-English speaking patients and negative perceptions about accuracy and style. 38 Physicians in our analysis were optimistic about the potential for long-term use of ambient AI scribes. 38 By integrating qualitative and quantitative methods, we found that physicians’ perceptions were associated with flat or decreased EHR use. Physicians were optimistic about the potential for the AI scribe to improve patient–physician communication and interaction. For instance, patients tend to be less participatory when physicians engage in high levels of keyboard activity, and there are more instances of silence during the encounter. 39 Our findings underscore the importance of conducting usability tests and assessing various vendors before widespread implementation. Another study comparing a suite of scribe technologies demonstrated a framework for evaluating scribes’ usability, technical performance, and accuracy. 40
The proliferation of AI scribes has caused governmental agencies to scrutinize the technology and to categorize them as medical devices.41–43 As AI scribes evolve, standardization and additional oversight are likely. Regulation of AI scribe tools poses challenges and potential opportunities to standardize workflows and monitor burden. Therefore, understanding best practices and effective methods of incorporating the technology is still needed. Our study identified barriers and facilitators to using the technology. The AI scribes hold the potential for physicians to focus on patients, verbally and nonverbally, which can facilitate patient-centeredness and positively impact patient understanding and adherence. 44
Limitations of our project include a small sample size limited to primary care with varied levels of utilization, a defined period of 4 weeks, data collection limitations, and the use of one AI platform that was not, at the time, integrated with our EHR. We also only assessed physicians in the top 20% of TOSH. These physicians could have different workloads and stylistic differences in their approach to note writing, which may not apply to physicians with lower TOSH percentages. Additionally, since our study took place, AI scribe technology has advanced, and new vendors are continuously introducing improved products. Further, it is important to compare physicians’ use of the technology among various specialties, as uses and effectiveness may vary. Future research should examine AI scribe usage among clinicians of all EHR TOSH levels, and within multiple specialties. It is also important to understand the perceptions and habits of other stakeholders using the technology, such as nurses and patients.
Conclusion
Scribes and virtual scribes (present via a phone or teleconferencing) have been associated with decreases in total EHR time, time spent on notes, and TOSH. Artificial intelligence scribe technology has the potential to decrease costs, reduce time in the EHR, and change how clinicians interact with patients. The goal of our study was to evaluate an artificial scribe technology and understand clinicians’ approaches toward using the technology. We did not find reductions in EHR time but did uncover clinicians’ perceptions and their observed barriers and facilitators. Further study is required to test the effectiveness of implementation strategies to achieve a more immediate positive effect. Like any other technology introduced into a health system that affects numerous parties, such as clinicians, nurses, and patients, careful attention must be paid toward implementation practices to ensure success.
