Abstract
Drug adverse events (AEs) are a major health threat to patients seeking medical treatment and a significant barrier in drug discovery and development. AEs are now required to be submitted during clinical trials and can be extracted from
Introduction
Adverse events (AEs) are unintended and undesirable effects as a result of the use of drug treatment or other medical product in a patient. AEs represent a significant barrier in drug development for patient treatment. Serious drug adverse effect is the fourth leading cause of death in the United States, with over 100,000 people dying from this each year. 1 Approximately 30% of failures in drug clinical trials are due to the intensity of adverse side effects. 2 We hypothesize that learning about drug-related AEs from clinical trial data will provide new insights and reveal unexpected relationships between drugs, AEs, and drug targets for future drug development and biomedical research.
Recently, many research efforts have been focused on understanding the relationships between drug targets and AEs, with the aim to elucidate the molecular mechanisms underlying drug adverse effects for better development and repurposing of drugs. Two important data sources—drug target and AE databases (AEDB)—are needed to investigate the relationships between drugs, drug targets, and AEs. There are many databases and repositories of drugs and target relationships, such as PubChem, 3 ChemBank, 4 DrugBank, 5 BindingDB, 6 and DSigDB. 7 The U.S. Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) represents one of the most popular AEDB of FDA-approved drugs. FAERS is designed to support the FDA's postmarketing safety surveillance program for drugs and therapeutic products. FAERS collects AEs and medication error reports that manufacturers, healthcare professionals, and consumers submit to the FDA. Another popular source for extracting drug AEs is FDA package inserts for products.
Computational tools and approaches such as machine learning and text mining have been developed and used to construct resources and predictors of drug-AE relationships. SIDER, 8 the side effect resource, represents one of the earliest computational approaches to extracting drug AEs from FDA package inserts. Currently, SIDER contains 5,868 AEs, 1,430 drugs, and 139,756 drug-AE pairs. 9 In contrast, the OFFSIDES database contains AEs not listed on the FDA's official drug label. Currently, this database contains 1,332 drugs, 10,097 AEs, and 438,801 drug-AE pairs, and these information could be used for polypharmacology research. 10 MetaADEDB is a database of adverse drug events and has been recently developed by combining SIDER, OFFSIDES, and the Comparative Toxicology Database. 11 Machine learning approaches have been used to build predictors for drug-AEs, by using properties such as chemical structures, drug targets, structural information, and drug–protein interaction networks. 12 –15
Although research tools and resources have been developed for predicting drug-AE relationships, knowledge and prediction of drug-AEs are still far from perfect. One of the main limitations of the current approaches is that they are limited to FDA-approved drugs,
9,10
as the primary source of AEs. Other experimental compounds tested in clinical trials were not captured by the current approaches. These experimental compound AEs in clinical trials represent an untapped resource, and could potentially provide new knowledge of drug-AE relationships. Collections of trials data and results are now available in various clinical trials registries, such as
To understand the patterns of AEs reported in clinical trials, we performed “big data mining” on the published results from the
Materials and Methods
ClinicalTrials.gov and HTML Contents
Data Extraction and Text Mining
To extract AEs and other related trial information from the

Overview of the research strategy.
Drug List
We extracted drug information from the clinical trial results using a drug dictionary. This not only results in fewer false-positive drug identifications in the cohort descriptions but also restricts drug and AE relationships to a list of known drug names for further analysis. In this study, we used the FDA-approved drugs and experimental compounds obtained from DSigDB.
7
DSigDB currently holds 17,389 unique compounds and 19,531 drug target genes, and is freely available at
AE Database, Analysis and Visualization
The extracted drugs and AEs were uploaded to our adverse event database (AEDB). Our AEDB contains nine tables, focusing on drugs, AEs, cohorts, and drug targets. The AEDB schema is illustrated in Figure 2. The AEDB was developed using the open source database MySQL.

Entity-relationship model of the AEDB. AEDB, adverse event database.
PRR Analysis
We used the PRR that summarizes the extent to which a certain AE is reported for patients taking a particular drug compared with the frequency at which the same AE is reported in other drugs. 17 The PRR has been used to find signal in AEs for safety reporting in drugs. 10,17,18 A PRR greater than one implies that the drug of interest had a higher reported frequency of the AE than the rest of the drugs.
Results
Summary of the AEs Extracted from ClinicalTrials.gov1
We downloaded 18,567 trials with results reported in the
Data Extracted from
Table 2 summarizes the statistics of the data in our AEDB. We have extracted 8,161 trials from
Summary Statistics of the Database
FDA, U.S. Food and Drug Administration.
Statistics of the AEs
We extracted and grouped the AEs from

Statistics of the AEs.
The 10 most common conditions investigated in these clinical trials were Type 2 diabetes mellitus, breast cancer, chronic obstructive pulmonary disease, hypertension, rheumatoid arthritis, asthma, schizophrenia, nonsmall cell lung cancer, hepatitis C, and prostate cancer (Fig. 3C). To explore the disease-AE relationships, we performed PRR analysis on the top 20 conditions. Supplementary Figure S4 shows the disease-AE relationships in a heatmap. For example, auditory hallucination is strongly correlated with schizophrenia, major depressive disorder, Parkinson's disease, epilepsy, and Alzheimer's disease.
AEs in Different Phases of Clinical Trials
Next, we investigate the AEs recorded in the different phases of clinical trials. Figure 4A shows the breakdown of the clinical trial phases in this study. The top three phases with the most complete AE results were Phase 3 (34.4%), Phase 2 (32.6%), and Phase 4 (15.4%). We found that Phase 1/2 patients experienced the highest number of AEs, followed by Phase 1 and Phase 2 patients. This is not surprising as these early trial phases are enriched with experimental compounds, and the main objective of these trials is to determine the toxicity of these compounds in patients. Accordingly, Phase 3 and 4 patients experienced the least number of AEs; these trials are late-stage trials where the main objectives are the efficacy of the drugs (Phase 3) and postmarketing surveillance of the drugs (Phase 4). Figure 4B shows the average number of AEs per patient in these different phases of trials.

AEs in different phases of clinical trials.
Statistics of the Drug-AE Relationships
Next, we investigate the drug-AE relationships that were extracted from

Drug-AE relationships.
Case Study: Kinase Inhibitor-AE Relationships
To illustrate one application of AEDB, we performed PRR for comparing selected small molecule kinase inhibitors and AEs. Protein kinases play a key role as regulators and transducers of signaling in eukaryotic cells, and represent the largest and well-studied “druggable” families in the human genome. 19 Many kinases are mutated in cancer genomes, and cancer cells depend on these mutated kinases for proliferation, growth, and survival signaling. Therefore, small molecule inhibitors that inhibit kinases either in wild-type or mutated forms are actively studied in the pharmaceutical industry and academia. However, due to the conserved sequence similarity between kinases, many kinase inhibitors have off-target effects, which can ultimately lead to AEs in patients.
The majority of chronic myelogenous leukemia (CML) cases are driven by the oncogenic kinase fusion of BCR-ABL. Imatinib is a small molecule kinase inhibitor that specifically inhibits the activity of BCR-ABL, and dramatically improves the survival of CML patients. 20 Imatinib represents the first FDA-approved kinase inhibitor in treating CML; additional four kinase inhibitors (dasatinib, nilotinib, bosutinib, and ponatinib) are also approved by FDA for this disease. However, these newer kinase inhibitors cause more serious AEs. To study these AEs, we focused on the five kinase inhibitors approved for the treatment of CML: imatinib, nilotinib, dasatinib, bosutinib, and ponatinib. We used PRR to evaluate AEs reported for each of the kinase inhibitors. Figure 6 shows the top 10 AEs found in each kinase inhibitor in the database compared to placebo. From this heatmap, it is clear that ponatinib has more AEs and a different AE profile compared to the other kinase inhibitors.

Kinase inhibitor-AE relationships. Heatmap of the PRR of the top 10 AEs of imatinib, dasatinib, nilotinib, bosutinib, ponatinib, and placebo. The PRR is normalized per AE, where red and blue colors indicate high and low frequencies, respectively. PRR, proportional reporting ratio. Color images available online at
We further investigated the selected vascular-related AEs associated with these kinase inhibitors. These vascular AEs have emerged as a serious consequence of the treatment of kinase inhibitors. 21,22 From the analysis, we found that these kinase inhibitors have a higher PRR score in peripheral arterial occlusive disease, embolism, hypertension, platelet dysfunction, hyperglycemia, and hair loss, compared with the other drugs in the database (Table 3). Specifically, ponatinib has the highest PRR score in peripheral arterial occlusive disease. This finding is supported by the FDA warnings for ponatinib, which include serious AEs related to life-threatening blood clots and severe narrowing of the blood vessels. As a result, the FDA issued a temporary marketing suspension of ponatinib in October 2013 and began to require extra safety measures for ponatinib in December 2013, before the company resumed marketing. 23,24 This suggests that performing data mining on our database may reveal new knowledge about drug-AE relationships.
Vascular Event Proportional Reporting Ratios for the Five Kinase Inhibitors Commonly Used to Treat Chronic Myelogenous Leukemia Patients
NA, not applicable due to no data.
Discussion
We have performed “big data” mining and pattern analysis of drug AEs in
Current drug-AEDB such as SIDER focus on only FDA-approved drugs, as most of the AEs are obtained from the FAERS or FDA drug labels. In contrast, AEDB extracted both FDA-approved drugs and experimental compounds from clinical trial data. These experimental compounds and AE relationships have not been fully studied, and are an untapped resource for mining new drug-AE relationships. We believe that our database provides a unique opportunity to learn and extract drug-AE relationships, and it is complementary to the existing AE resources. Our database can easily be scaled up to capture new data deposited to
Trial data reported in
In the future, we would like to investigate the drug target-AE relationships in our database to elucidate the molecular mechanisms of drug actions and improve personalized medicine using previously published methodologies. 12 –15,27 We would like to use AEs for predicting novel drug–target interactions for drug repurposing and repositioning. We would also like to develop an interactive web portal such as 8,9,28,29 that users can utilize to query, retrieve, and analyze data collected in this database. The database would be searchable by drug, drug target, AE, condition, and/or clinical trial. Our study has the limitation of not considering the different drug dosages and their related AEs, which we plan to address in our future work.
In conclusion, we have extracted clinical drug trial data from
Footnotes
Acknowledgments
We would like to acknowledge the Tan Lab members for their constructive comments on this project. We thank Susan Kim for suggestions and editing of the article. This work is partly supported by the National Institutes of Health P50CA058187, P30CA046934, Cancer League of Colorado, and the David F. and Margaret T. Grohne Family Foundation.
Disclosure Statement
No competing financial interests exist.
