Abstract
BACKGROUND:
Physicians and nurses are responsible for reporting medical adverse events. Each views these events through a different lens subject to their role-based perceptions and barriers. Physicians typically engage with diagnosis and treatment while nurses primarily care for patients’ daily lives and mental well-being. This results in reporting and describing medical adverse events differently.
OBJECTIVE:
We aimed to compare adverse medical event reports generated by physicians and nurses to better understand the differences and similarities in perspective as well as the nature of adverse medical events using social network analysis (SNA) and latent Dirichlet allocation (LDA).
METHODS:
The current study examined data from the Maccabi Healthcare Community. Approximately 17,868 records were collected from 2000 to 2017 regarding medical adverse events. Data analysis used SNA and LDA to perform descriptive text analytics and understand underlying phenomenon.
RESULTS:
A significant difference in harm levels reported by physicians and nurses was discovered. Shared topic keyword lists broken down by physicians and nurses were derived. Overall, communication, lack of attention, and information transfer issues were reported in medical adverse events data. Specialized keywords, more likely to be used by a physician were determined as: repeated prescriptions, diabetes complications, and x-ray examinations. For nurses, the most common special adverse event behavior keywords were vaccine problem, certificates of fitness, death and incapacity, and abnormal dosage.
CONCLUSIONS:
Communication and inattentiveness appeared most frequently in medical adverse events reports regardless of whether doctors or nurses did the reporting. Findings suggest feedback and information sharing processes could be implemented as a step toward alleviating many issues. Institutional management, healthcare managers and government officials should take actions to decrease medical adverse events, many of which may be preventable.
Keywords
Background
Front line healthcare providers, physicians and nurses play important roles in reporting medical adverse events [1]. Each group has a different focus and culture [2] which is often based on their perceptions and various barriers [3–5]. For example, physicians engage with diagnosis and treatment for patients, and nurses primarily care for patients’ daily lives and mental well-being [6,7]. These unique roles result in differences both in how medical adverse events are reported and in the ways they are described [8–10]. Such information can be used to create adverse event detection systems [11]. For example, nurses often reported routine errors and risky situations, while physicians tended to report adverse events resulting from errors that yielded severe or fatal consequences [7,12]. In addition, physicians and nurses typically reported different types of medical adverse events related to their attitudes toward reporting [13,14]. Rowin et al. [15] report that physicians more often reported events that “caused permanent harm, near death, or death of a patient” (p. 537). Nurses, on the other hand, generally reported events that “caused no or temporary harm” (p. 537). Nurses often felt a larger obligation to inform patients of the adverse event than did physicians but rarely acted on these feelings [16]. In other literature, serious events that occurred to the patient were more likely to be reported by nurses than physicians [17] but this may vary in different settings [18]. For example, Källberg [5] interviewed 10 physicians and 10 nurses about this problem, and suggested that high workload, lack of control, communication failures, and organizational failures were the main reasons for medical adverse events. El-Shafy et al. [19] and Sassoli and Day [20] showed that communication had a strong correlation with medical adverse events, and that closed-loop communication can effectively prevent medical adverse events. In a studies of Greek nurses and physicians, Moumtzoglou [21,22] found that doctors felt disclosing department identity was important in medical adverse event reporting while nurses believed this should be kept anonymous to encourage more accurate reporting.
However, differences regarding physician and nurse behavior were not a focus of these studies. Therefore, further research of adverse event contents, from the perspectives of physicians and nurses, may provide additional insights that can reduce medical adverse events. Moreover, research into nurse and physician reporting differences can help healthcare managers discover best practices and develop targeted regulations or measures to decrease and prevent medical adverse events.
Objectives
The primary objectives addressed by this study include: (1) understanding whether differences in harm levels in medical adverse events reports exist between physician and nurse generated reports; (2) determining main keywords used by doctors and nurses to describe medical adverse events; (3) finding the main topics at each harm level broken down by nurse and doctor reports; and (4) discerning underlying behavior patterns that may influence medical adverse event reporting by physicians and nurses at each harm level.
Methods
Data collection
All data for the study was collected by the Maccabi Healthcare Community. Further descriptions of the collection method and data can be found in related research [23]. The current study used 17,868 medical adverse event records collected from 2000 to 2017. According to Maccabi the definition of an adverse event is: “Any event during the process of providing medical care, before and after the process that caused or had the potential to cause physical and/or psychological harm to a patient”. In the current study, the ‘harm level’ field distinguished between five categories, according to the severity of damage to the patient, caregivers and organization. The following levels were used:
Type A (1-no harm and 2-minor harm): an adverse event with potential for damage, which ended with minor or no damages at all. These events are termed ‘near misses’.
Type B (3-moderate harm): an adverse event that resulted in moderate damage to a patient, reversible or with estimated damage claims up to $250 K. The event is classified by the severest of damages: to the patient or to the Maccabi institution.
Type C (4-severe harm and 5-death): an adverse event that resulted in severe or irreversible damage to a patient, or an estimated malpractice claim of over $250K. Death of a patient is included automatically in this category. The event is classified according to the more severe of damages: to the patient or to the Maccabi institution.
The current study used several techniques for analysis. Among these is text mining [24] done with social network analysis (SNA) and latent Dirichlet allocation (LDA) [25].
Social network analysis
Social network analysis (SNA) examines social structures by using network and graph theory. This approach is popular in many disciplines [26,27]. In health care studies, SNA research mainly focuses on four aspects: institutional exchange, physician collaboration, clinical co-occurrence, and workplace interaction networks [28]. For example, Jonson et al. [29] analyzed emergency medical organizations’ communication using SNA and revealed several information-sharing structures that existed. Han and Wang [30] used SNA to trace the dissemination and sharing process of health-related information in a virtual community. Raghavan et al. [31] revealed that type 2 diabetes and obesity in social contacts were associated with an individual’s diabetes risk.
Very few prior research projects have combined social network analysis with adverse event analysis in health care fields [23]. The combination of SNA and medical adverse event reporting offers much promise.
Latent Dirichlet allocation (LDA)
LDA is an unsupervised learning method used to cluster similar parts of documents and discover different topics according to distribution characteristics of natural language [32,33]. The basic assumption of this method is that a document consists of several topics, and each topic comprises many words with shared characteristics and meaning [34]. LDA methods are widely used in computer science with a specific focus on text mining and information retrieval [35,36]. Several LDA applications exist in healthcare. S.-H. Wang et al. [37] captured keywords of abstracts in PubMed, and using LDA discovered relevant, underlying issues, such as adolescent depression. Magua et al. [38] used LDA to identify gender bias in peer review and showed male investigators had higher positive text descriptions than did females. Wu et al. [39] applied LDA to online health communities to understand user engagement. Regarding the combination of LDA and medical adverse event reporting, to our knowledge, no research has been conducted.
SNA and LDA uses
SNA and LDA are widely used and have their own advantages in different disciplines. SNA is useful to find keywords in co-occurrence patterns. LDA is useful to find the topics in the keywords. Recent research projects have combined SNA with LDA in text mining areas to discover more meaningful results through the overlap of roles (keywords) and topics from a set of raw documents. L’Huillier et al. [40] performed a topic-based social network analysis in the Dark Web to locate key extremist group members. Cha et al. [41] used the SNA popularity concept to determine influencers and important topics. Ríos et al. [42] used LDA to filter users not contributing to a social network’s purpose, and SNA to find influencers. Moessner et al. [43] used LDA to find topics and SNA for betweenness to identify crucial users. To our knowledge, there are no related studies that use the two methods together to discover an adverse event pattern in medical disciplines other than an earlier study by the current authors [23].
Research design
Differences in medical adverse event harm levels between physicians and nurses were assessed using t-tests. Keywords from each description were analyzed for physicians and nurses using SNA methods. Next, topic areas for each harm level were determined with LDA methods. Finally, common topic keywords by physicians and nurses respectively were developed with LDA and SNA analyzes results. R version 3.3.0 [44] with base, dplyr, tidytext, tidyverse, igraph, topicmodels, ldatuning, and ggplot2 was used. Network visualization was completed with Pajek version 5.01.64 [26,34,45,46].
Data analysis
Data analysis followed a procedure similar to Zhu et al. [23]. First, text mining was conducted according to conventional practice [45,47]. Next, a SNA analysis generated common word pairs co-appearing within the same descriptions. Higher frequency pairs were extracted by setting the threshold at 100 for the physician group and 50 for the nurse group. Because the original pairs number for physician and nurse groups was 105,905 and 47,034 respectively, too many potential pairs were available to be included in the network’s format. Therefore, we extracted the most meaningful portions of the word pairs list using methods found in prior research [23,48,49]. Proportional threshold values (e.g. 100 for physicians and 50 for nurses) were used in the current study [50]. Igraph generated the undirected network based on co-occurring words pairs [51]. This was exported into Pajek and transformed into a weighted, undirected network using standard techniques [23,26]. We followed this with a weak component (cluster) analysis, and removed clusters with only two nodes [26]. We computed degree centrality [26]. Then set the value as the size of nodes to show the relation structure. Finally, a visualization of the networks for physicians and nurses was developed.
The LDA analysis followed. Similar to Zhu et al. [23] the approach first focused on data source preparation using previously processed data frames split into 5 sub-data frames according to harm level. We determined the optimal topic numbers for each sub-data frame according to standard practice [52–55].
The optimal topic numbers were: 3 for no harm; 6 for minor harm; 11 for moderate harm; 6 for severe harm; and 5 for death harm [23]. The software package (topicmodels) extracted topics for each sub-data frame. 10 keywords having the highest 𝛽 were used to develop an understanding of each topic.
Finally, the SNA and LDA results were combined. Keywords in the network and topic list were compared and extracted. These showed relationships between keywords and topics for physicians and nurses to provide a result common to both methods.
Results
Harm level difference analysis between physicians and nurses
Table 1 illustrates the distribution of medical adverse events in each harm level by physician and nurse. In general, the number of medical adverse events physicians reported was about two times greater than nurses for ‘no harm’, ‘minor harm’ and ‘moderate harm levels’. The number of medical adverse events physicians reported was about four times higher than nurses for the ‘severe harm’ and ‘death harm’ levels. The number of minor harm level items accounted for most of the medical adverse events. Of these, nurses reported 73.17% and physicians reported about 67.86%.
Adverse event harm level reported by the nurse and physician
Adverse event harm level reported by the nurse and physician
Note: The number format is frequency (probability).
In order to determine whether a significant difference existed in harm level by physician and nurse, independent-samples t-tests were conducted. Since large sample sizes regularly show a high statistical significance, the effect size received consideration as suggested in prior research [56,57]. The sample sizes for medical adverse events reported by physicians and nurses were different, so Hedges’ g was used to compute the value of effective sample size [58]. Analysis results indicated a significant difference existed in the harm levels between the physician (M = 2.187, SD = 0.670) and nurse groups (M = 2.080, SD = 0.576), t (13196) = 11.095, p < .001, d = 0.168, CI [−0.127, −0.0887]. Hedges’ g was d = 0.168 (less than the 0.2 threshold) and this indicated physicians and nurses had a small harm level difference of 0.2 standard deviations [59–61].
SNA techniques revealed primary keywords used to describe the medical adverse events broken down by physicians and nurses. The physician network comprised 92 nodes and 142 edges. The nurse network comprised 84 nodes and 188 edges. The densities of the physician and nurse networks were 0.034 and 0.054 respectively. This indicated that keywords in the nurse network had more actual connections than those in the physicians’ network.
Keyword cluster analysis for physician and nurse networks
Physician network
Researchers used the Pajek cluster analysis tool to determine the physician network comprised 5 clusters (Fig. 1). Table 2 provides representative keywords listed by degree for each cluster to indicate how keywords connect in terms of centrality. The most complex cluster, 1, had 53 keywords but only 10 with the highest degree are listed.

The visualization of clusters in the physician group.
Frequency distribution of physician network cluster values
Abbreviations: Freq, Frequency; Cum Freq, Cumulative Frequency, %, Percent.
The nurse network is shown in Fig. 2. This figure comprises three clusters. The first cluster had 63 keywords, the second cluster had four keywords, and the third cluster had three keywords. In Table 3, the ‘representative’ keywords in each cluster were listed by degree. Because cluster 1 had 63 keywords and was larger than other clusters, we only listed the 10 keywords with the highest degree.

The visualization of clusters in the nurse network.
Frequency distribution of cluster values for nurse network
Abbreviations: Freq, Frequency; Cum Freq, Cumulative Frequency; %, Percent.
Topics were extracted per harm level using LDA as described in earlier research [23]. The topic numbers ranged from 4 to 11.
Integration analysis of SNA and LDA
The SNA cluster keywords and the LDA topic lists were integrated to provide additional insight. The words found in common across both SNA and LDA results were extracted for nurses and physicians. This informed an exploration of their relationships with each harm levels. Shared keywords, called ‘common keywords’, were derived from the results. These are displayed in Tables 4--10 (Appendix).
Discussion
A large number of medical adverse events are preventable and this warrants further and more detailed research [62]. Previous work developed common causes of medical adverse events and looked at various harm levels. However, it left a research gap by not stratifying the data by reporter. The current study investigates that gap and provides insight into the roles that reporter groups (e.g. physicians and nurses) and their behavior play when considering harm level and other factors. This study investigated the intersection of these group by text mining an extensive, long-term medical adverse event database. We also explored relationships and differences in keywords about medical adverse events among two groups of reporters (e.g. physicians and nurses) and harm levels using a combination of SNA and LDA methods.
The research utilized steps that have been tested in prior research [23]: (1) SNA analysis to determine primary keywords used in medical adverse event reports for both physicians and nurses; (2) LDA analysis to determine primary topics in medical adverse event reports for different harm levels for physicians and nurses; and, (3) combination of the SNA and LDA results for reporters (physicians and nurses), and harm level topics. This study extends a preceding analysis in both novel and important ways [23]. Not only does it increase knowledge regarding physician and nurse differences in medical adverse event reporting system use, it also enlarges the corpus of medical adverse event research by considering harm level differences across two reporting groups. A comprehensive, systematic research framework results that permits a better understanding of the relationships among types of reporters, harm level and adverse event content.
In today’s data-driven healthcare environment, complex work is delivered by multidisciplinary teams, each with unique perspectives regarding priorities and concerns. Because of this, many studies of adverse events focus on physicians and nurses for the important roles they play in the medical service process. Our literature review revealed the majority of these studies (78%) pointed out that medical adverse events were reported by physicians and nurses [7,63–65]. Schuerer et al. [66] showed that nurses were more likely to report errors that may lead to harmful events, and that physicians were more likely to report the harmful events in hospital settings.
Still other studies suggest the preventable nature of many medical adverse events, many of which relate to communication problems [67], flawed information transfer, incomplete records, and lack of follow-up [68–71]. The current study supports this notion broadly as well as more specifically indicating that these issues exist at a more granular level impacting both physician and nurse groups at various harm levels. Many adverse events certainly are preventable when adjustments are made in the healthcare protocol and reporting systems, but further validation measures would be required in a future study.
Our study highlights the important role of communication in dispensing healthcare. Physicians and nurses play different roles and present unique risk factors regarding potential communication failures [23,72]. Enhancing communication between individuals in different healthcare roles will help increase accuracy when recording disease symptoms. As a result, correct treatment becomes more likely. According to Nundy et al. [73] enhancing communication can reduce adverse medical events by 19%. In the current study, communication-related problems were present in both physician and nurse reports, and across different harm levels.
Related to communication, the current analysis revealed that failure to understand the risks of a medical procedure contributed to adverse events. For example, when physicians or nurses recorded information related to patient symptoms or diagnosis, they may have neglected to include all pertinent information. In other instances, records appeared to be misplaced or lost. Likewise, in other instances either a nurse or physician appeared to lack required knowledge either about the procedure or about the patient’s condition. Again, this could be prevented with additional communication. Prior studies reported similar findings. For example, Ayoubian et al. [74] describe preventable nursing errors in a healthcare facility that can lead to harmful events.
Other important outcomes for this study indicate that significant differences between physicians and nurses exist in medical adverse event reports. A portion of these differences are explained by their different roles and the division of labor. Generally, nurses focus more on providing treatment to individual patients, whereas physicians focus more on diagnosis and decision making regarding the needed treatment [75]. When reporting a medical adverse event, their working characteristics are brought into the medical adverse event report system [76].
The threat of patient violence also appears when examining the data in a more granular fashion [23]. This important issue appears within both physician and nurse reports and highlights the need for institutional attention for this matter. Patient violence was common in areas such as emergency rooms and psychiatric wards. Prior research indicates similar concerns [23,77,78].
From a methodological perspective, the current study supports prior work by Zhu et al. [23] with an example of using SNA and LDA in a complementary fashion that results in useful outcomes.
Implications
This study extends earlier work by Zhu et al. [23] through a more granular analysis. As in prior studies, findings indicate important implications requiring institutional intervention and efforts at the individual level. Supportive societal or governmental invention is also required to ensure resource are available to improve healthcare delivery quality and offer support systems that bridge between internal provider roles and external institutional roles.
The results of the analysis indicate a reporter within the healthcare institution was not solely responsible for the adverse medical events. In fact, the problems appear to be endemic, related to communication and other broad causes. Therefore, this study has the potential to inform physicians and nurses about potential pitfalls. Awareness alone can help prevent medical adverse events. Reporting events is important and doing so must consider that those reading or using the reports may play different roles and have varying concerns. The use of common language may help overcome current issues. Likewise, adhering to proven medical practices and focusing on cross-role communication would serve the institution well. Further research into healthcare communication are indicated by this study. For example, experiencing certain situations prone to medical adverse events (e.g. consultation, test refusal, treatment refusal, response refusal), require that the physician become more prudent in the consultation phase. For nurses, they must be more aware of the potential for medical adverse events when they deal with vaccines and other treatments. It is important to take time and be more cautious because these are areas prone to problems. Prior research suggests methods for improvement in this area [79].
Other implications from this research apply at the organizational level. For example, healthcare facility and IT managers must ensure up-to-date technology system exist to support communication and healthcare worker interaction. Having adequate system training, providing sufficient staffing and ensuring employees are not overworked could contribute to reductions in adverse medical events. Another solution could relate to incorporating quality assurance training into medical workforces. Researchers have highlighted the positive impacts of these forms of organizational support [80,81]. Quality improvement processes often focus on communication and reducing organizational barriers to information exchange. In the end, physicians and nurses could benefit from improved interaction with each other and with patients [12,82]. Another quality enhancement relates to developing checklists and routine checks in areas prone to medical error. Many of these types of actions can be automated with technology [83,84].
At an even broader level, this research indicates the need for government programs that can help bridge differences between institutions. Information exchange and knowledge sharing is critical. The technology exists and must be applied to ensure communication and a removal of information transfer barriers [19].
Prevention of medical errors and adverse events, as in other disciplines, is based on understanding the phenomenon of errors and risks in general and specific context. This study, by using advanced statistical methodology (LDA and SNA), analyzes the keywords that characterize 17,868 records that were collected from 2000 to 2017, thus enabling us to expose the common hidden characteristics of the adverse events in the data base. In this study, communication and inattentiveness appeared most frequently in medical adverse events reports regardless of whether doctors or nurses did the reporting. Knowing these specific characteristics of risks and errors, enables us to define systemic and focused interventions to mitigate the risks of medical errors and their consequences. Among others are: the measures that are undertaken to reduce medical errors in providing care, including changes in the IT system, improving procedures and protocols, staff training, patient empowerment and so forth.
From a methodological standpoint, the approach for combining LDA and SNA methods is innovative and can be adapted to other medical and non-medical studies. The research supports prior work and indicates a more granular analysis comparing the similarities and differences of the underlying behavior patterns for physicians and nurses, at several harm levels [23].
The common keyword list broken down at physician and nurse levels is useful to managers of medical institutions. The similarities and differences highlight areas where improved focus and communication synergies may be required. It also provides insight into areas where a common language already exists and suggests other forms of medical problems exist.
Limitations
This study is not without limitations. For example, many low-frequency medical adverse events are reported in the data. These may be crucial and require closer attentions according to the long tail effect [85]. The potential to mine these data for more specific information and from different perspectives exists. A second major limitation relates to the data itself. All records came from a single medical database system from a major healthcare provider in Israel. While this study does not offer an empirically tested action plan for improvement and reduction of medical errors, it does provide a broad stroke look at what has caused errors in the past. This valuable insight is now possible with modern data analytics techniques.
Future research
This rich data set offers potential for more analyses and for finding deeper insights. Collecting more medical harmful event data from a wider assortment of healthcare providers could result in added knowledge.
Conclusion
The current research fills a gap in an earlier, broader study [23] by investigating differences in the roles played by reporters (e.g. physicians and nurses) using a large, historic medical adverse database from a major healthcare provider. The data was broken down by reporter and examined at various harm levels using SNA and LDA methods. Findings revealed that harm level reporting of medical adverse events was significantly different between physicians and nurses. SNA methods helped identify common keywords across harm levels and reporter roles. We also applied LDA to determine medical adverse event topics for each harm level and across reporter roles. The LDA and SNA methods were combined to compare common topic keywords by reporters and harm levels. Through this text mining analysis, we observed that communication type problems and inattentiveness were common in medical adverse events for both physicians and nurses. Related high probability keywords reported by both physicians and nurses were examined and used to form a set of recommendations. These recommendations have not been empirically tested but can provide guidance for future quality improvement efforts. Moreover, we found the main keywords used by physicians or nurses correlated with specific harm levels. Our findings suggest more can be done at both institutional and society (e.g. governmental) levels to support healthcare delivery and reduce adverse medical events.
