Abstract
BACKGROUND:
As Twitter has gained significant popularity, tweets can serve as large pool of readily available data to estimate the adverse events (AEs) of medications.
OBJECTIVE:
This study evaluated whether tweets were an early indicator for potential safety warnings. Additionally, the trend of AEs posted on Twitter was compared with AEs from the Yellow Card system in the United Kingdom.
METHODS:
English Tweets for 35 drug-event pairs for the period 2017–2019, two years prior to the date of EMA Pharmacovigilance Risk Assessment Committee (PRAC) meeting, were collected. Both signal and non-signal AEs were manually identified and encoded using the MedDRA dictionary. AEs from Yellow Card were also gathered for the same period. Descriptive and inferential statistical analysis was conducted using Fisher’s exact test to assess the distribution and proportion of AEs from the two data sources.
RESULTS:
Of the total 61,661 English tweets, 1,411 had negative or neutral sentiment and mention of at least one AE. Tweets for 15 out of the 35 drugs (42.9%) contained AEs associated with the signals. On pooling data from Twitter and Yellow Card, 24 out of 35 drug-event pairs (68.6%) were identified prior to the respective PRAC meetings. Both data sources showed similar distribution of AEs based on seriousness, however, the distribution based on labelling was divergent.
CONCLUSION:
Twitter cannot be used in isolation for signal detection in current pharmacovigilance (PV) systems. However, it can be used in combination with traditional PV systems for early signal detection, as it can provide a holistic drug safety profile.
Introduction
Classical pharmacovigilance (PV) systems are mainly based on spontaneous reporting by healthcare professionals. Adverse drug reaction (ADR) reports are collected either by the pharmaceutical companies or national competent authorities and then transmitted to Eudravigilance, which acts as a central repository for reports. These data are constantly monitored for any signals indicating unexpected reactions to medicinal products [1].
The European Medicines Agency (EMA) established the Pharmacovigilance Risk Assessment Committee (PRAC) in July 2012 to strengthen safety monitoring of medicines across Europe (EMA, 2020b). The Committee meets once every month to prioritise and evaluate new safety signals in a timely manner [2]. Some national regulatory agencies have also established systems for direct consumer reporting, such as the Yellow Card system in the United Kingdom [3]. Patients, carers and healthcare professionals submit suspected side effects reports to the Yellow Card website, and these reports are assessed by medical specialists to identify any unknown safety issues [4].
These classical PV systems have proved to be effective in identifying potential signals. However, they are subject to under-reporting due to a lack of public awareness and the resulting limited patient involvement in the process [5,6]. A systematic review [7] revealed significant and widespread under-reporting of adverse events (AEs) to spontaneous reporting systems, including serious or severe AEs. A population survey conducted in Great Britain [8] suggested that public awareness of the Yellow Card system was low and could be improved. Furthermore, there is also a disparity in the reporting rates between developed and developing countries, as there can be a lack of predesigned reporting system in the developing countries, along with limited material and human resources in the drug regulatory authorities. Developing countries prioritise apparent issues such as drug registration over routine PV activities [9].
Randomised clinical trials, observational studies, biomedical literature and product labels are also additional vital mechanisms for evaluating the safety and efficacy of new medicinal products. However, each source has limitations for identifying rare adverse events:
As a result, new sources of PV are now being considered for gathering AE-related data, including leveraging secondary electronic health records or search engine logs and social media posts by performing text mining and frequency analysis [13]. Among the various social networking sites, Twitter is a popular microblogging platform where users publicly share information, including personal thoughts and emotions [14].
Social media users have been increasing significantly in recent years, and the Statista website estimates the total number of social media users will be 3.09 billion by the end of 2021 [15]. As patients do not need to be cognizant of the link between drug intake and symptoms while reporting AEs, mining of social media can detect signals more quickly than traditional PV systems. A recent survey showed that about 3%–4% of internet users had publicly shared their concerns about AEs of medications [16].
Regulatory agencies are also becoming aware of the potential and utility of social media as a source of information for benefit-risk evaluation of medicines. In 2014, the EU Innovative Medicines Initiative (IMI) WEB-RADR (Web-Recognising Adverse Drug Reactions) assessed the use of social media in safety monitoring [17]. Similarly, the United States has been collaborating with a patient networking website to generate data for risk management activities since 2015 [18].
A retrospective analysis [19] of Facebook and Twitter data to examine whether specific product-event pairs were identified in the social media data prior to reporting to US FDA Adverse Reporting System (FAERS), found 10 safety signals in the social media posts prior to their reporting to FAERS. Conversely, results of a pilot study [20] contradicted the findings discussed above. Their study did not identify new safety signals in social media or provide any information on device or product quality issues. Nonetheless, the data provided insights into medicine tolerability, adherence, the quality of life of the patient on the medication and the patient’s perspectives on therapy. However, these studies used only 10 and 6 drug-event pairs, respectively, for analysis. On reviewing these drug-event pairs of interest from both publications [19,20], their drug class was not as diverse as our study. Thus, to broaden the sample size and include a wider range of therapeutic agents, the current study evaluated tweets related to 35 drug-event pairs to explore whether Twitter posts could be considered an early indicator of a potential safety warning. Moreover, this study analysed both old and new drugs to eliminate the bias of more Tweets being reported for older drugs.
Methodology
The 35 drug-event pairs of interest were selected from the list of safety signals discussed by EMA PRAC in the year 2019 [21]. These drugs were classified using Anatomical Therapeutic Class (ATC) index into therapeutic drug classes (refer to Table 1).
List of drugs with ADR keywords and comparative analysis of drugs based on safety signals identified in Twitter data and Yellow Card
List of drugs with ADR keywords and comparative analysis of drugs based on safety signals identified in Twitter data and Yellow Card
Search terms for each drug contained the brand names used in the United States (from Drugs.com) and the generic names used in Europe. The European names were included in the search criteria as the safety signals were discussed in EMA PRAC meeting. While the United States brand names were included, as many drugs are marketed in United States prior to their marketing in Europe [22], and to cover the international market.
Symptomology of the safety signals was compiled using the medical websites: Mayo Clinic (https://www.mayoclinic.org/), WebMD (https://symptomchecker.webmd.com/symptoms-a-z) and DermNet NZ (https://dermnetnz.org/) and were translated into ADR keywords. These keywords were used as a reference to identify signals in tweets associated with the drugs of interest. These ADR keywords were formulated using patient specific language (vernacular language). For instance, for the adverse event of arthralgia/ arthritis the relevant ADR keywords were Joint pain; Stiffness; decrease in motion; skin redness; arthralgia; arthritis and joint swelling. For anaphylactic reaction, the selected ADR keywords were Anaphylactic reaction; anaphylaxis; allergic reaction; hypersensitivity; hives; flushed skin; paleness; lump in throat; difficulty swallowing; sneezing; wheezing; tingling hands; swollen tongue; swollen lips. The complete list of these drugs and the relevant keywords are available in Table 1. Twitter posts were downloaded using the Twitter advanced search URL (Uniform Resource Locater): https://twitter.com/explore for the period of interest using python code derived from the GitHub library [23]. The review period for each drug-signal pair was calculated as two years prior to the date when the PRAC meeting was held for discussing the safety signal. For instance, if the safety signal of dysphagia for the product gabapentin was discussed in the PRAC meeting on 14 January 2019, the Twitter posts were analysed from 14 January 2017 to 13 January 2019.
Twitter posts generally contain many abbreviations, emoticons, typos, hashtags, mentions, URLs and stop words. Hence, pre-processing of extracted tweets is an important step to achieving a clean and accurate dataset for analysis. Python code derived from GitHub libraries [24–26] was used to carry out pre-processing of tweets. Sentiment analysis can improve AE classification accuracy while assessing the tweets. It also helps in differentiating the AEs from the indication of the medicines. Hence, the Valence Aware Dictionary for Sentiment Reasoning (VADER) was used for analysis [27].
The pre-processed cleaned tweets were manually reviewed and classified into AE and non-AE tweets. Identified AE tweets were coded using MedDRA dictionary version 22.1 into relevant System Organ Class (SOC), High Level Term (HLT) and Preferred Term (PT). Identified AEs were classified as signal-associated AEs if they were in the ADR keywords pertaining to safety signals discussed in the PRAC meeting, while the remaining AEs which did not match the safety signals were classified as non-signal associated AEs. All identified AEs were further classified into serious and non-serious AEs based on the EMA important medical events [28]. The list is developed and maintained by the Eudravigilance Expert Working Group to assist in aggregate data analysis and routine PV activities. AEs present in the IME list were considered serious, while AEs not listed were marked as non-serious.
All AEs were assessed against the product SmPC (EU Summary of Product Characteristics) to determine whether the AEs are labelled or unlabelled events. The reference safety information (i.e. SmPC) of the products was accessed from the electronic medicines compendium [29].
Spontaneous reports collected by Yellow Card was downloaded from MHRA website in CSV format [30]. The time period was concurrent with the collection of Twitter data for all drugs of interest. Each drug report includes the following information: AE number; year collected in Yellow Card database; sender type; reporter type; seriousness; MedDRA PT, HLT and SOC of the AE; whether the AE was associated with a fatal outcome; and name of the drug. AEs from the Yellow Card database were classified into signal and non-signal associated events based on ADR keywords. All AEs were also assessed against the EU SmPC into labelled and unlabelled events. Finally, to align the seriousness criteria of the AEs from the two data sources, all AEs received from the Yellow Card database were re-classified into serious and non-serious AEs based on their inclusion in the EMA IME list (Important Medical Event List) (refer to Fig. 1). The validity of the determination of serious vs non-serious events is based on this EMA IME list which is a standard list used in the PV industry for determining seriousness if it is not obvious from the reported information.

Methodology flowchart. *AEs which do not match with ADR keywords pertaining to signal were eliminated.
An additional comparison was performed based on HLTs between the data received from Twitter and the Yellow Card system. Of note, the top 25 HLTs were chosen based on the threshold (total adverse events ≥ 5). Hence, all the HLTs with more than 5 AEs were compared for Twitter and Yellow Card data.
Descriptive statistical analysis was conducted to assess the distribution of AEs received from Twitter based on therapeutic drug class, seriousness and labelling. Similarly, the trend of AEs received from Twitter was compared with AEs collected from the Yellow Card system. Inferential statistical analysis was performed in R (version 3.5.3) using Fisher’s exact test to assess the distribution and proportion of AEs received from Twitter and the Yellow Card database based on seriousness and labelling. Results on categorical measurements were presented in number (%) and assessed at the 5% level of significance. For inter-coder reliability, the second author independently coded a random selection of 10% of the posts in Twitter (n = 6166). The Cohen kappa inter-rater agreement coefficient [31], which adjusts for the proportion of agreements that takes place, was evaluated using the guidelines outlined by Landis and Koch [32], where the strength of the kappa coefficient is as follows: 0.01–0.20 slight; 0.21–0.40 fair; 0.41–0.60 moderate; 0.61–0.80 substantial; 0.81–1.00 almost perfect. The analysis provided an inter-rater reliability of 95% agreement and an overall Kappa coefficient of 0.76. Therefore, the inter-coder reliability was substantial. All discrepancies between coders were resolved through discussion. Any further discrepancy was resolved with the support of a third researcher when necessary to eventually reach 100% of agreement.
A total of 123,410 tweets were downloaded for 35 drugs of interest using the Twitter search URL. Only 61,661 tweets out of the total 123,410 tweets were detected as English tweets using the language detection library and selected for further analysis.
Of the 61,661 tweets, a total of 1,411 tweets were found to have one or more AEs. The remaining 60,262 tweets were excluded from analysis as they either had no AEs or conveyed a positive sentiment (refer to Fig. 1). Overall analysis showed that around 17.4% of the total tweets with AEs belonged to the drug Olanzapine (17.4%), followed by Finasteride-Dutasteride (11%) and Febuxostat (10.7%). While very few tweets with AEs were found for the drugs Nilotinib, Indapamide and Temozolomide (<1%).
On analysing the 1,411 tweets with AEs, a total of 577 AEs were identified for the drugs of interest. Of these 577 AEs, 421 (73.2%) were considered non-serious and 154 (26.8%) were serious, while two AEs could not be classified. The top two drug classes for serious AEs were anti-neoplastic and immuno-modulating agents (34.4%) and nervous system (27.9%).
Of the 577 AEs, 346 (60.2%) were unlabelled and 229 (39.8%) were labelled, while two AEs could not be classified. Out of the 229 unlabelled AEs, 97 were classified as serious, while 249 were non-serious. Unlabelled AEs were mainly reported for drug classes nervous system (33.5%), anti-neoplastic and immunomodulating agents (26.6%) and immuno-suppressants (11.3%).
Upon mapping these 577 identified AEs to the list of ADR keywords, only 35 AEs were found to be associated with the safety signals for 15 out of the total 35 drug-event pairs in this study (see Table 1). Of these 15 drug-event pairs, nine drugs belonged to drug class Anti-neoplastic and immuno-modulating agents, while three drugs belonged to the nervous system class and one drug each belonged to drug classes acting on Respiratory, Alimentary and genito-urinary systems.
Thus, 15 out of 35 drug event-pairs (42.9%) were found in Twitter, of which four (11.4%) were exclusively found in tweets and not found in the Yellow Card database. Out of the 20 drug event pairs (57.1%) not found in tweets, 11 (31.4%) were also not found in yellow card database. Consequently, if both the data sources where used in conjunction for signal detections, AEs pertaining to 24 drug-event pairs (68.6%) would have been identified (see Table 2) prior to the PRAC meeting.
Comparison of identified safety signals across both data sources
Comparison of identified safety signals across both data sources
A comparison between the top 25 most frequently reported AEs from both data sources revealed that several top 25 AEs were common to both data sources, including fatigue, nausea, haemorrhage and death. Similarly, on comparing AEs based on high level terms (HLTs) several top HLTs were common to both data sources such as asthenic conditions; neurological signs and symptoms NEC (Not Elsewhere Classified); diarrhoea (excl. infective); rashes, eruptions and exanthems NEC and therapeutic and nontherapeutic effects (excl. toxicity; see Table 3).
Comparison of AEs based on high level term (HLT) from both data sources
Altogether 4,577 AEs were received from the two data sources. Of the 4,577 AEs, 577 AEs (12.6%) were received from Twitter and 4000 AEs (87.4%) were received from the Yellow Card database. Of the total AEs identified in tweets, 26.8% of the AEs were serious and 73.2% were non-serious. Similarly, of the total AEs received from the Yellow Card database, 28.2% were serious and 71.8% were non-serious. Hence, the distribution of AEs based on seriousness was similar for the two data sources (p = 0.518) (see Table 4). Furthermore, of the total AEs identified in tweets, 39.8% of AEs were labelled and 60.2% were unlabelled. However, of the total AEs received from the Yellow Card database, 26.2% were labelled and 73.8% were unlabelled. Hence, the distribution of AEs based on label assessment was statistically different for the two data sources (p < 0.001**) (refer to Table 5).
Distribution of AEs based on seriousness from both data sources
P = 0.518, Chi-square test.
Distribution of AEs based on labelling assessment for both data sources
P < 0.001**, significant, Chi-square test.
Drug-related adverse events pose substantial risk to patients. Early detection of adverse events will not only benefit the drug regulators but will also help the manufacturers for pharmacovigilance. Several reports have shown that Twitter data is a viable source for pharmacovigilance signals as the information is directly collected from the patients. Social media data in adjuvant with traditional safety reporting systems have potential to uncover post-marketing signals more rapidly [19]. This finding was consistent with studies conducted to assess the performance of combining data from spontaneous reporting system (SRS) and Twitter to assist detection of safety signals [33]. The results of these studies also concluded that the accuracy of signal detection using Twitter could be improved by combining both the data sources. Lardon and colleagues [13] opined that Twitter data can be used retrospectively to support identified signal. On other hand, the data can be used prospectively to better qualify adverse events on new drugs with insufficient bibliographical knowledge [5].
The current retrospective study was designed to evaluate whether tweets can serve as an early indicator for potential safety signals. As a secondary objective of this study, the trend of AEs posted in Twitter was compared with AEs received from the Yellow Card system in the United Kingdom.
The most obvious finding to emerge from this study was the small proportion of data relevant for pharmacovigilance. Out of the 61,661 tweets with reference to the drugs of interest, only 1,411 (2.2%) tweets were found to have one or more AEs. These findings are consistent with another study [34] where only 1,642 tweets out of the 40 million downloaded tweets were identified as relevant for further analysis.
Upon analysis of the AEs identified in the tweets for the 35 drug-event pairs, 15 drug-event pairs (42.9%) were found in tweets prior to the date of the EMA PRAC meeting. Furthermore, upon combining data from Twitter and Yellow Card, 24 drug-event pairs (69%) could be identified prior to the respective PRAC meetings. This finding broadly supports the work of other studies in this area where Twitter is recommended to be used as a harmonising source for signal detection.
The possible explanations for 20 safety signals not being seen in social media are: Rarity/severity of the safety signal: for instance, the event haemophagocytic lymphohistiocytosis is a rare immune system disease which is seen in 1 in 100,000 persons due to recessive nature of the disease [35]. This safety signal was neither identified in Twitter nor in the Yellow Card data. Diagnosis after hospitalisation/difficulty in diagnosis of the safety signal: for example, pancreatitis was a safety signal for two drugs of interest, namely Apixaban and Vismodegib. Pancreatitis is usually diagnosed in hospitals and patients are hospitalised for monitoring of complications. Hence, reporting of event pancreatitis might be detected through formal reported channels such as the traditional PV system and not mentioned on social media platforms. Similarly, myasthenia gravis is difficult to diagnose, as weakness is a common symptom for many neurological disorders. Symptomatology of both these signals was identified in the Yellow Card data. Social stigma/confidentiality issues: some patients would consider the events confidential and not discuss them on social platforms. For instance, no tweets pertaining to gynecomastia or its associated symptomology were received for the drug Febuxostat during the review period. Although the event is common in young men, gynecomastia leads to social distancing and low self-esteem issues, further leading to anxiety and depression [36].
One important finding from this study was that AEs belonging to SOCs: nervous system disorders (n = 19), cardiac disorders (n = 10) and psychiatric disorders (n = 8; refer to Table 6) were also identified in Tweets during this study. These findings contradict two other studies which concluded that social media posts/patient blogs only primarily reported AEs pertaining to general disorders and administration site conditions [37,38].
Serious and unlabelled AEs (based on SOC) received from tweets
Serious and unlabelled AEs (based on SOC) received from tweets
In this study, no AEs pertaining to product issues and surgical and medical procedures were received from Twitter data. These results seem to be consistent with other research which also found no AEs pertaining to device of product quality issues [39].
Out of the 35 drugs of interest chosen for this study, 17 belonged to the drug class anti-neoplastic agents and 5 belonged to the drug class nervous system. Consistent with this proportion, serious and unlabelled AEs were mainly reported for oncology (n = 48) and neurological (n = 22) drug classes. These results further support the idea of integrating social media in modern oncology practice and research. In oncology setting, Twitter data can complement information from health records to enhance understand of patients’ experiences and improve health outcomes [40].
The trend analysis of the AEs identified in Twitter posts with AEs reported using classical PV system concluded that in both the data sources, the serious and unlabelled AEs were most commonly reported for SOC nervous system disorders, while the top drug class for which AEs was received was anti-neoplastic agents.
The most interesting finding was that the AEs pertaining to psychiatric disorders were more frequently reported on social media platform than by traditional reporting systems. These results were in line with the previous research [41] where frequently reported AEs from social media also belonged to psychiatric disorders. Another finding of the current study was that the AEs pertaining to metabolic disorders were also reported more frequently on Twitter than Yellow Card. This result is consistent with analysis conducted to quantify glucocorticoid related adverse events [42]. In this analysis, the most commonly reported PTs were weight increased (8.2%) and increased appetite (7.5%). This illustrates that Twitter data are a source for non-serious adverse effects frequently experienced by patients.
The current study also found that AEs pertaining to congenital abnormalities were also identified at higher rate on Twitter than in Yellow Card. Hence, tweets can serve as a novel method of observing pregnancies and associated drug-related congenital abnormalities. This can be supported by the statistics that nearly 52% Twitter users belong to young age groups (18–34 years) [43]. This is in line with the epidemiological analysis [22] which proposes gathering valuable information on pregnancy from Twitter considering the higher proportion of users belong to childbearing age.
The proportion of serious AEs received from Twitter and Yellow Card data was consistent between the two data sources (p = 0.518). On the contrary, there was a significant difference (p < 0.001**) in the proportion of labelled AEs received from the two data sources. However, the labelling of events does not play a critical role in signal detection, as some AEs which do not appear in the product label are still considered for further monitoring as part of risk management activities [44].
This study had a few limitations, such as focus was only on English tweets and the brand names used for searching the medicinal products on Twitter belonged only to European and the United States market. The PRAC assessment is performed by EMEA (European Medicine Agency) which collects safety data from all European countries. The safety data received at EMEA is translated to English language, even though it is received in non-English European language from the reporters. As list of signals assessed by PRAC were considered for our study, the chances of missing signals originating from non-English languages seemed minimal. Additionally, social media also introduces an inherent bias, where patients with a younger age or belonging to certain cultural and socioeconomic groups might have greater access to social media platforms, while patients belonging to older age groups or sicker patients who experience more severe ADRs may not access social media as frequently as the former population, thus limiting the reporting of severe ADRs [38]. Furthermore, Twitter feeds cannot determine geographical location of the user and hence cannot be followed up for validation and confirmation of ADRs. Twitter also has a technical limitation of a word limit below 280 characters, thus providing less information compared to other classical PV sources [5].
This study resonates the conclusion of IMI WEB-RADR pilot study which concluded that the stand-alone use of social media for pharmacovigilance cannot be recommended at this stage considering the time and effort taken to map the adverse events. The use of social media channels can have an added value for specific areas such as drug abuse and pregnancy related outcomes [45].
It was found that Twitter cannot be used in isolation for signal detection activity but can be used in combination with traditional PV systems (i.e., spontaneous reporting systems) for early signal detection, as it can provide a holistic drug safety profile. Although AEs pertaining to safety signals and symptomatology were identified on Twitter, a standardised process needs to be set up to determine causality assessment. Overall, there were more similarities than dissimilarities in the proportion and distribution of AEs between Twitter and Yellow Card. Twitter provides the patient’s perspective and behaviour towards the use of marketed drugs. Thus, Twitter data can also be used in accordance with the suggestions published by Sinnenberg et al. [46], where Twitter can be used to conduct content analysis of specific safety issue, surveillance using statistical signal detection and recruitment of research survey participants in clinical studies.
Footnotes
Acknowledgement
Conflict of interest
The authors have no competing interests to report.
Funding
This research received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement #115014, resources of which are composed of financial contribution from the European Union’s Seventh Framework Programme (FP7/2007-2013; for more info, see
). In addition, the Health Research Centre at the University of Almeria (Spain) also provided funding for this research.
Data availability
The data that support the findings of this study are available upon reasonable request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
