Abstract
Studies show that clinicians are increasingly burning out in large part from the clerical burden associated with using Electronic Medical Record (EMR) systems. At the same time, recently developed health data analytic algorithms struggle with poor quality free-text entered data in these systems. We developed AutoScribe using artificial intelligence–based natural language processing tools to automate these clerical tasks and to output high-quality EMR data. In this article, we describe the benefits and drawbacks of our technology. Furthermore, we describe how we are positioning our company’s culture within the existing healthcare system and suggest steps leaders of the system should consider in order to ensure that potentially transformative artificial intelligence–based technologies like ours are optimally adopted.
Introduction
When thinking about doing one’s part to ensure an improving and sustainable learning health system, one of the first steps is to grasp the issues that are currently getting worse in healthcare, as these are the issues that need urgent interventions to stem the tide and ultimately reverse the trend. 1 Unfortunately, there are a whole host of issues that fall in this category, but one of them had hit particularly close to home for me in the earlier part of this decade. When I entered McGill medical school, I had this naive perception of what my day-to-day work would look like: I envisioned I would primarily be spending my time developing deeply human relationships with my patients as I treated, counselled, and helped them navigate through the challenging world of illness, suffering, and the health system built to address these. Three years into medical school, my innocent optimism on my expected work experience was already heading toward cynicism, and this was right at the start of my career. My work reality instead followed a daily pattern in which I only had a small percent of my time available to engage with patients because I was required to spend most of my time completing primarily clerical work on the computer systems underpinning the health organizations that I worked at—documenting notes, ordering tests, ordering therapies, completing referrals, electronically communicating with colleagues, and eventually billing tasks. These all took an inordinate amount of software navigation compared to the consumer programs we were all used to in our personal lives. So the term digital “scutwork” was used among my colleagues and I to describe this time. It soon became clear that my time at work was primarily to be spent completing my scutwork, rather than using the clinical skills I received in my training, such as patient-centred interviewing, conducting physical examinations, clinical reasoning, shared decision-making, and building and sustaining a trusting therapeutic relationship with my patients.
While anecdotally among my colleagues, we all felt perplexed by this unexpected division of our work time, it is only in recent years that it has become clear that the entire healthcare workforce in Western societies are feeling this way. Recent research has shown clinicians spend up to 50% of their time manually entering information from patient interviews into clinical documentation in the Electronic Medical Record (EMR) user interface. 2 Well-respected investigative reporting has brought forth powerful narratives that exemplify the problem. 3 With studies showing burnout among physicians approaching 50% and one in three residents having symptoms of depression, the mental health situation has been deemed so dire that global leaders of health system design thought it critical to modify the so-called Triple Aim of Health Care Systems to the Quadruple Aim, in which the new aim is to ensure the well-being of the system’s human resources. 4 –6 It has become widely acknowledged that one of the primary reasons for frontline clinical staff burning out is the introduction of EMR/electronic health record systems, whose designs did not adequately take into account the service and end-user workflow and characteristics. These are compounded by other stresses on the healthcare system, including an ageing population with more medical needs in the context of constrained medical funding streams, whether that be private or public (government) insurers.
Some clinicians have decided to pay for intelligent dictation services or even to hire human scribes to follow them in their clinical workday to complete all the digital clerical work for them. Despite the elevated costs of these tools, many clinicians purchased these anyway, often at a financial loss to them, because reducing their time on clerical work and rebalancing the division of their labour toward the rewarding work that actually leverage their clinical skills were so important to them. Some have even resorted to hiring companies who provide remote scribes, in which microphones audio-record clinical interview dialogues and transmit that audio to far-flung places in an unclear privacy-compliant way for the clerical work to be done by cheap labour in third world countries.
Meanwhile, the laboriously entered data in EMR, whether entered by the clinician or a scribe, are often incomplete and inconsistent, which have meant from a computational perspective that there is wide variability in the quality of EMR data. 7 Free-text typing in EMRs is the main cause for this incomplete and inconsistent data, which means substantial, time-consuming, manual data preprocessing is required before any data analytics tools, which have recently been developed to manage clinical care and cost, can perform adequately.
The arrival of ambient virtual scribe technology
Thankfully, recent advancements in computer science have come at an opportune time to address this problem before it has reached irreparable crisis levels. Buzzwords circulate in the popular vernacular such as “artificial intelligence” also known as “AI,” “machine learning,” “speech recognition,” and “natural language processing” also known as “NLP” that claim to be able to solve an endless list of problems in all industries. Many are familiar with consumer tools that use these novel technologies, such as Google Home and Amazon’s Alexa, that are reliably performing accurate voice analysis to carry our valuable activities for people in their homes or on their smartphones. Our team therefore considered what are these tools’ abilities exactly and how can they be developed and organized such that collectively they provide a useful virtual scribe service for clinicians? Specifically, how can we automate many clerical tasks in EMRs and begin to solve the aforementioned increasing digital “scutwork” issue plaguing healthcare?
At this point, it is worth briefly reviewing the core of various intersecting disciplines that enable this technology to be possible. The first one is AI, which is used to describe machines that mimic “cognitive” functions that humans associate only with human minds, such as “abstracting” and “problem solving.” Hard-coded, rules-based computational approaches to these tasks were found to be insufficient to reliably achieve these. With much research, numerous critical tools were found to be needed and aggregated to succeed at specific AI tasks, such as heuristics, optimization, probabilistic models, and statistical classification. 8
The introduction of Artificial Neural Network (ANN)–based machine learning in recent years has enabled performance in AI to become incredibly accurate in areas that had previously fared relatively poor. Artificial neural networks are computational devices that replicate the interconnection of neurons in the brain, which through computational nodes and layers feed through weighted data, which together begin to encode concepts. Machine learning algorithms and statistical models are computer systems that effectively perform a specific task without using explicit instructions, relying on patterns and inference instead. In 2010, the technique of deep (machine) learning was discovered, in which through many layers of ANN nodes, massive amounts of heterogeneous training data were applied to the machine learning algorithm embedded into the ANN, which yielded a major jump in AI performance. 9 So when you hear AI mentioned today, it is most likely referring to the use of these recent advanced techniques of ANN-based deep learning.
The second discipline to review is linguistics, which is the scientific study of natural language structure (morphology, syntax, and phonetics) and meaning (semantics and pragmatics). Computational linguistics therefore is a modern discipline that aims to generate statistical or rule-based computational models for natural language and the various kinds of linguistic phenomena listed above. Specifically, NLP refers to computational approaches used to infer rules and patterns within free-form, transcribed language such as text so as to convert the free-form text into a computationally structured format.
Taking AI and NLP together, we now have NLP models that can infer patterns from textual data that are readily adaptable to different tasks, such as recurrent neural networks and word embedding. 10 These now have consistently high performance providing novel value propositions to problems in the real world. 11 How has AI-based NLP been deployed in healthcare around the world so far? In fact, NLP has been used to translate relevant medical knowledge, locked in unstructured medical notes, into structured data which can be utilized to build better models and computer processes to drive improvements in outcomes. Common general applications in healthcare include information extraction from unstructured notes, clinical text categorization, and clinical text summarization.
Using these novel tools, I as a visiting researcher at St. Michael’s Hospital’s Center for Health Analytics Research and Training, together with the Center’s director Muhammad Mamdani, partnered with computer scientist Frank Rudzicz and his students Faiza Khan Khattak and Serena Jeblee at the University of Toronto’s Department of Computer Science, sought to build a dependable virtual scribe for physicians. Our clinical team members diligently annotated transcribed patient-physician dialogue data, which was used to train in a supervisory manner the models we developed to replicate the cognitive abstraction processes we as clinicians do when extracting and converting data from layman patient-physician conversations to clinical notes with all the appropriate clinical nomenclature. We optimized a machine learning model to accurately classify dialogue phrases in the patient interview as contextually pertinent to clinical documentation, which is the foundational step to generating EMR data from the analysis of patient-clinician dialogues. Taken together with matching dialogues to the actual clinical note output in an unsupervisory approach, we have succeeded in developing a prototype ambient virtual scribe called AutoScribe that we believe, with enough data, will eventually be an extraordinarily helpful assistant to clinicians. AutoScribe works by leveraging a conferencing microphone attached to a physician’s desktop computer in the exam room and then using automatic speech recognition to automatically extract and parse pertinent medical and its contextual information from patient-clinician dialogues for the purpose of automatically generating in near real-time EMR notes. AutoScribe can then suggest corresponding EMR actions based on the content of that medical note, such as updates to a patient’s profile, ordering investigations and prescriptions, and billing. The idea is once the clinician initiates AutoScribe by clicking on its user interface element, they will treat it as their personal, clinical documentation assistant in the exam room as they are chatting with their patient. Our encrypted, privacy-by-design prototype learns from the edits to AutoScribe’s outputs that the clinicians make, thereby becoming more accurate over time and personalizing initial outputs to the documentation style for future clinical scenarios. And we have also built AutoScribe to be adaptable to any medical specialty and clinical workflow.
How does AutoScribe address the problems of digital “scutwork”-related burnout and poor EMR data quality?
We have previously published performance metrics from our tool as it was at the time of manuscript submission in November 2018. 12 Since then it has become incrementally more accurate and reliable with each tranche of audio and transcript data added to train the AI. By fall of 2020, we expect to reach a market-ready version of AutoScribe performing with approximately 85% recall and 90% precision after a clinician has personalized outputs through 6 months of use and edits. When comparing to work productivity benefits of human scribe, we anticipate AutoScribe will lead to a similar reduction in EMR user effort and time of nearly 50%, at a fraction of the cost. 13 Moreover, this will lead to more engaged and efficient doctor-patient visits, since the clinician does not have to focus on the computer screen as much, and many EMR clerical tasks will be readily prepared for them as their conversing with their patients. Our validated data capture tool will be able to populate the EMR with highly consistent, structured clinical data, which theoretically would enable health data analytics tools to perform at a higher level. Finally, we also have a model in development which converts the key clinical points within the patient-physician dialogue into a list of doctor’s recommendations in layman’s language that can then be electronically forwarded to patients, which we expect will improve adherence to those recommendations and thus improve health outcomes.
Taken together, the AutoScribe data capture tool will enable healthcare to be more proactive, predictive, and preventive and facilitate delivering continuous, individualized value-based care focused on quality improvement targets. If the technology is trusted and used by clinicians, ultimately less administrative work in electronic systems through automation will support all dimensions of the quadruple aim and become critical tools to support the entities within the patient-centred medical home model. 14
Challenges with implementation
As mentioned earlier, the advanced machine learning models that power the NLP within AutoScribe has the potential to accomplish things we previously thought were impossible. However, there are several impediments before one can reliably achieve these impressively high performance levels. First, models require enormous amounts of high-quality annotated data to train on, which is often very challenging to acquire from healthcare settings. This is because many data in healthcare are siloed from one another, and even if clinicians or hospital organizations would like to liberate their data for the purpose of AI training, in certain instances, EMR vendors essentially lock in the data from access by third parties. Moreover, annotating the data can require significant amount of time and manual effort from individuals with clinical knowledge, and the data themselves can have inherent biases that the models then accentuate when applied to new data, which may inadvertently lead to a worsening of social inequities. Another common issue is that models can become overfitted to the datasets they are trained on, which can lead to loss of their generalizability and loss of performance when applied to other new data. Finally, many algorithms are termed “black boxes” because the so-called hidden layers within the neural networks of deep machine learning compute such complex math that no one can truly understand the process that yields the outputs they have. Clinicians rightly demand an ability to scrutinize the models so that if an output leads to a clinical error, this can be readily transparent in order to be corrected for the next time. It is true that explainability models that enable an acceptable analysis of the process that led to a model’s output have recently been developed. So far though it remains a major challenge in optimizing models in a human-friendly manner, one that health software regulators such as the Food and Drug Administration struggle to adequately manage, especially since the models constantly update with each new data given to the model to train on. 15,16
In terms of implementation, the design of the healthcare system and underlying incentives distort what would normally be an intuitive process of exploration of potentially highly innovative technologies by individual healthcare organizations. Specifically, health organizations’ technology buyers are often hesitant to consider piloting an innovative product like AutoScribe. This is because the risk-benefit ratio for products like ours is not clear at the early stage, and procurement regulations in healthcare do not have the flexibility built-in to account for this lack of clarity for innovations that fall within this stage of the development cycle. In the case that pilots are deployed, the scaling of the technology based on positive experience also has headwinds due to the fragmentation and malalignment of various actors within the system. Furthermore, to date, there has been no clear guidance or certification requirements from health professional bodies and other regulatory groups on how to approach procurement decisions for advanced AI-powered technologies like AutoScribe.
As a new vendor in a burgeoning field that may dramatically affect the practice of medicine, we would welcome health system groups providing a set of clear, reasonable certification rules for our technology based on a critical appraisal framework of best evidence-based practices of how the technology should be ethically developed, deployed, updated, and embedded within healthcare. Most importantly, we want to make sure the audio data we collect that contain recordings of potentially very sensitive conversations are treated with the utmost care. We agree that our motives as a for-profit entity should support existing consenting processes, physician data stewardship, and patient data ownership rules and that the personal health information within the data we use to train our models are used in an appropriate way. Our technology should also help to further clinical quality improvement plans, in ways that keep these affordable and free of commercial bias. Since managing NLP outputs such as AutoScribe’s is a new clinical responsibility with significant impact on workflow without compensation, we support co-designing the solutions for physicians’ workflow and patient preferences and co-facilitating change management to ensure minimal disruption during the initial adoption phase.
We have described our product AutoScribe, which is a novel approach to clinician-patient dialogue parsing, whose outputs are oriented toward pragmatic linguistic features, and the needs of clinicians. Specifically, we have developed machine learning models based on recurrent neural networks that extract medical linguistic entities and their contextual partners into high-quality patient documentation. This structured documentation data can be directly integrated into standard EMR data fields, which are then readily amenable to the application of data analytics tools. Our team is composed of a mix of clinicians and computer scientists, and the resulting interdisciplinary discussions have been critical to getting the execution on an AI-based product like this right. For now, our work is focused on reducing the administrative burden on physicians. Although there is a lot of hyperbole related to AI in healthcare these days, the old adage of “garbage in, garbage out” regarding data and algorithm outputs still applies to AI-powered technologies like AutoScribe. With the critical nature of our outputs, it is essential we obtain high-quality data to train our models, and we need the collaboration of healthcare stakeholders to do this in a way that fosters healthcare as a public good. To build our trusting relationship with our customers, the end-user frontline clinicians, we need to cultivate a transparent and collaborative relationship that will enable the challenges and concerns above be addressed together sooner rather than later. For instance, we want to make sure our customers realize they are also our data generators for our AI models. That is a fundamentally novel dynamic between a digital health vendor and physician end-user. So as a company, we are making it a priority to build in this culture of openness and fairness from the get-go. As our future technology begins to involve other fascinating features, such as voice acoustics analysis and computer vision of nonverbal body language within the clinical encounter and overlaying real-time clinical decision support, we need to have already established a deep trust with the end users to ensure mutual benefit for all involved. That is the only way we are ever going to gain initial acceptance for our technology that to some healthcare stakeholders may feel creepy, so that everyone, but mainly patients, benefit from the amazing advancements of AI. In short, the work our start-up is doing is just the tip of the iceberg, so as a community of health system leaders, let us get this right now before the AI tsunami hits our overwhelmed sector.
