Abstract
Abstract
The original contribution of the adverse outcome pathway (AOP) concept to predictive toxicology is that the use of a range of assays covering the main events of the whole toxicity pathway would improve the prediction of the apical endpoints with alternative tests. This would permit more solid, mechanistically based predictions. However, explicit and systematic quantitative analysis of the AOPs and their integration into larger schemes is at times lacking. This leaves room to subjective implementations. We have investigated the quantitative modeling aspects of AOPs related to important apical endpoints, that is, skin sensitization, endocrine disruptors, and carcinogenicity. A common trend is that the major contribution to the final predictivity is given by in vitro or in silico models of molecular initiating events (MIEs), which appear to be the rate-limiting steps of the toxicity pathways, whereas models for intermediate events have poor correlations with apical endpoints. Since MIEs are in general amenable to quantitative structure–activity relationships (QSAR) analysis, it can be anticipated that the integration of a few in vitro assays with QSAR models is going to provide rapid and inexpensive approaches for the detection of toxins for many endpoints. This evidence agrees with the general experience of modeling: in different fields such as ecology, systems biology, and macroeconomics, grossly simplified models capture important features of the behavior of incredibly complex interacting systems and permit successful predictions. This work also stresses the need of applying quantitative analysis to the toxicity pathways.
Introduction
F
However, the need for totally or partially replacing the animal toxicity assays with shorter term, animal-free toxicological methods has become more and more compelling in recent decades. To fulfill this goal, a wide range of initiatives have been undertaken. Among others, an important one is the development of adverse outcome pathway (AOP) concept by the Organisation for the Economic Co-operation and Development (OECD). AOPs delineate the documented, plausible, and testable processes by which a chemical induces molecular perturbations (molecular initiating events [MIEs]) and the associated biological responses that describe how the molecular perturbations cause effects at the subcellular, cellular, tissue, organ, whole-animal, and population levels of observation. Adverse effects observed in vivo are the result of biological cascade initiated by the chemical structure of the toxicant.1–5
The AOP concept provides a framework that scientists can use to (1) identify critical events of the toxicological mechanisms; (2) devise experimental tests for the various events; (3) generate data; and (4) lastly use the data to test the original hypothesis (i.e., check if the AOP model actually predicts the apical endpoint).
The OECD has provided scientific support and technological infrastructure to collect and organize the knowledge on AOPs (www.oecd.org/chemicalsafety/testing/adverse-outcome-pathways-molecular-screening-and-toxicogenomics.htm). The AOPs proposed and discussed number to about 140, and span a wide range of toxicological endpoints as evidenced in the dedicated OECD AOP Portal (http://aopkb.org). 5 Seven out of them have been officially endorsed. A complementary OECD initiative is that of facilitating the integration of the different pieces of evidence from AOPs into a larger scheme called Integrated Approaches to Testing and Assessment (IATA) (www.oecd.org/chemicalsafety/risk-assessment/iata-integrated-approaches-to-testing-and-assessment.htm#AOP).
In the process of developing and implementing AOPs, a critical issue is how to go from the identification of AOPs to their practical use within alternative approaches for predicting toxicity.
While the AOP framework continues to evolve as a basis to increase transparency and defensibility of decisions, explicit and systematic quantitative analysis of the AOPs and their integration into larger schemes is at times lacking. This leaves room to subjective implementations in, for example, qualitative weight of evidence approaches. To overcome such subjectivity and reduce the uncertainty in practical applications (such as in regulatory applications), the AOP inititiative has to further progress toward a quantitative dimension.
Results and Discussion
In recent years, in our laboratory, we have investigated the quantitative modeling aspects of a number of AOPs. This article reviews these case studies, together with studies from other investigators, and attempts to identify trends and perpectives. In Appendix A1, the sources of data used in each analysis are provided: all data are freely available.
Skin sensitization
A first case study is the research on skin sensitization. The skin sensitization ability of chemicals can be tested with established animal assays. 6 Since the chemical and biological pathways involved are relatively well characterized, the search for alternative methods is aimed at mimicking the main events of the hypothesized toxicological mechanism with in vitro assays. Within this AOP, four key events (KEs) are considered necessary for the acquisition of skin sensitization: the covalent binding to skin proteins, also considered to be the MIE, the activation of keratinocytes, the maturation of dendritic cells, and the activation and proliferation of memory T cells. Several analyses have been published on how to best combine the in vitro assays within an optimized strategy. The overall predictivity of combinations of in vitro assays is quite good (80%–90%) in respect to both animal and human skin sensitization results. 6
In our laboratory, we have investigated the weight, or quantitative contribution of the individual assays to the final predictivity. This analysis has shown that one in vitro assay alone (direct peptide reactivity assay [DPRA], which mimicks the initiating event [i.e., reaction with skin proteins, haptenation]), is contributing with around 80% of predictivity. The combination of DPRA with other in vitro tests (e.g., KeratinoSens) that model intermediate KEs only improves the predictivity of another 5%–10%, up to a final 90% accuracy. 7
The importance of the MIE for the final predictivity was emphasized in other studies as well. In a recent article, data from the in vitro assays DPRA, KeratinoSens™, and h-CLAT were used together with a number of in silico structural descriptors to build different classification trees to predict skin sensitization hazard using in vivo LLNA results as reference. 8 The modeling exercise showed that skin sensitization was better predicted by models incorporating in silico parameters: the approach with the highest predictivity (characterized by 93% accuracy) consisted of a consensus of two classification trees, which were based on structural descriptors that account for protein reactivity and other structural features. Other quantitative structure–activity relationship (QSAR) analyses are in line with the above article.9–11
Overall, the above evidence suggests that the interaction of the chemical with the proteins (MIE) is the rate-limiting step that dictates the overall sequence of events, and is the basis for good predictive models.
Endocrine disruptors
Evidence to date indicates that hormone nuclear receptors are a major target of endocrine disrupting chemicals (EDCs) because these receptors are designed to bind small, lipoidal molecules (i.e., steroid hormones) that can be mimicked by many environmental chemicals. These nuclear receptors, once activated by their ligand, regulate the transcription of target genes, whose products in turn initiate a cascade of events eventually leading to organismic effects.
A number of in vivo assays were indicated by the OECD as suitable for screening potential EDCs (www.oecd.org/env/ehs/testing/oecdworkrelatedtoendocrinedisrupters.htm) namely:
Uterotrophic bioassay in rodents (a short-term screening assay for oestrogenic properties), repeated dose 28-day oral toxicity study in rodents, Daphnia magna reproduction test, and 21-day fish assay (a short-term screening for oestrogenic and androgenic activity, and aromatase inhibition).
Exploiting the knowledge on the mechanisms of action, new initiatives aimed at establishing alternative methods are arising. Within the ToxCast program, the U.S. Environmental Protection Agency (EPA) has implemented 18 in vitro high-throughput screening (HTS) assays that probe the estrogen receptor (ER) pathway in mammalian systems. Results from the 18 screening assays were integrated into a computational model that can discriminate bioactivity from assay-specific interference and cytotoxicity. The EPA is accepting ToxCast ER model data for 1812 assayed chemicals as alternatives for EDSP Tier 1 ER binding, ER transactivation, and uterotrophic assays. 12 The suite of HTS assays measures the MIE (i.e., receptor binding), as well as other KEs (e.g., receptor dimerization, DNA binding, transactivation, gene expression, and cell proliferation) in an AOP.
In our laboratory, we have investigated the contributions of the individual HTS assays to the AOP and to the prediction of the apical endpoint. 13 The HTS results from 1821 chemicals were analyzed with principal component analysis (PCA), which showed that almost 40% of information (variance) carried by the 18 HTS assays is actually highly intercorrelated and can be coded with only one summary variable, or principal component (PC1). PC1 summarizes information from HTS that measures specifically receptor binding capacity and receptor dimerization, that is, the MIE of the process. In addition, it was shown that the results of in vivo assays (in particular, the uterotrophic and fish21 assays) were correlated only with MIE information (as coded by PC1). This permitted the generation of predictive models for the uterotrophic and 21-day fish assays using a limited selection of ToxCast assays, which were related specifically to the MIE.
Whereas for skin sensitization there have been systematic attempts to build QSAR models able to predict directly the apical endpoint, QSAR work for the identification of EDCs has been largely focused on models for predicting the binding affinity in vitro to ERs (namely ERα and/or ERβ), which is recognized as the MIE. The published QSAR models are reported to be characterized by high predictivity.14–16 Since the binding is noncovalent in nature, besides molecular size and shape, the biological half-life of the chemical can be an important factor. It is hypothesized that QSAR may be used to predict the half-life.
Given the availability of a newly curated database on the uterotrophic assay, 17 it would be interesting to see to what extent the above QSAR models that incorporate mechanistic information on MIEs are able to predict the apical endpoint.
Carcinogenicity
Historically, the genotoxicity short-term tests have taken the pivotal role in the practice of prescreening of carcinogenicity. However, there is evidence that this strategy is not sensitive enough to detect all genotoxic carcinogens, and—obviously—cannot detect nongenotoxic carcinogens. Compared to the AOPs for skin sensitization and endocrine disruption, an AOP for carcinogenicity seems remarkably more difficult to formalize and implement; this is because there are several carcinogenicity mechanisms, all characterized by a high degree of complexity and not fully understood yet. However, a partial AOP for carcinogenicity can be tentatively sketched.
In this context, the MIEs are reactions of the chemicals with the DNA (and subsequent generation of DNA adducts), as well as non-DNA-reactive events such as indirect DNA damage and nongenotoxic mechanisms. A variety of MIEs are actually known, with a rich representation in databases of tested chemicals and of structural alerts (SA).18,19 The intermediate events in the carcinogenicity process are less clearly understood; however, crucial events such as gaps in intercellular communications, or tumor promotion, have been the subject of many studies, so that experimental data exist for various chemicals. It should be emphasized as well that cell transformation (while being a model for MIE) can also work as a model for these intermediate effects. Thus, partial AOP(s) for carcinogenicity can be sketched, including the apical effect as well as the MIE and intermediate effects that reinforce the mechanistic transparency and reliability of the predictions.
Building on a partial AOP, in our laboratory we have developed a 2-Tier approach for predicting chemical carcinogenicity. The strategy consists of the in vitro Ames and Syrian hamster embryo (SHE) cell transformation assays, combined with structure–activity relationships (SAR). The Ames test identifies DNA-reactive carcinogens that can also be identified through appropriate SAs. The chemicals negative in Tier 1 are supposed to be devoid of DNA reactivity and are studied in Tier 2 for their ability to induce cancer through epigenetic/non-DNA-reactive mechanisms. Since the cell transformation assays are sensitive to both genotoxic and nongenotoxic carcinogens, SHE is used in Tier 2 together with SAs specific for nongenotoxic mechanisms. The tiered approach resulted able to identify both genotoxic and nongenotoxic carcinogens, with an estimated 90%–95% sensitivity for rodent carcinogens. In addition, almost all IARC human carcinogens (326/329, 99%) were correctly identified.20,21 The take-home lesson from this case study is that a complex toxicological endpoint (carcinogenicity) can be predicted with simplified approaches that model the rate-limiting steps of the mechanisms of action.
QSAR and the concept of rate-limiting step
In the three case studies above, simplified approaches, based on a few parameters, were shown to be useful to predict complex in vivo endpoints. This can be rationalized through the concept of rate-limiting step. In a metabolic pathway or series of chemical reactions, the rate-limiting step is the slowest step that determines the overall rate of the other reactions in the pathway. In an AOP, a rate-limiting step is the step after which all the other necessary steps, once the process has gone beyond, follow in sequence.
A field where the concept has special relevance is that of QSAR, which is also an important area of alternative methods. This includes a variety of approaches with different characteristics (from quantitative models using physical/chemical parameters to more qualitative approaches based on the recognition of SA). The underlying idea is that the properties of the molecules (including toxicity) derive from their chemical structure, and that such structural features can be recognized and used to make predictions of activity.22,23
One issue that has sometimes discouraged investigators from applying QSAR analysis has been the idea that a biological process is too complicated and involves too many steps to be successfully modeled. However, despite the expected difficulties, in the last 50 years, thousands of successful QSAR analyses of biological activity have been generated.
The complexity of the situation has been thoroughly discussed by Hansch et al. in the case of skin carcinogenicity induced by polycyclic aromatic hydrocarbons (PAHs). 24 In the first step, the carcinogen must move through the skin (either through or around the cells) and into the cell where it is to be activated by the P-450 metabolic enzymes. This step is likely to be highly dependent on hydrophobicity, with little dependence on electronic or steric properties. In the following step, binding to the P-450 metabolic system is again dependent on hydrophobic and possibly steric factors. The next steps, involving oxidation of the P-450-bound carcinogen, are expected to be sensitive to the carcinogen's electronic and steric factors. After activation, the product must move within the cell, or into a nearby cell, to the DNA: this step will depend on hydrophobic and steric factors since the electronic characteristics of the diol epoxide will determine the lifetime of this intermediate and its tendency to undergo deactivating side-reactions before reaching the DNA. In the generation of reaction products with DNA, all three physicochemical factors (steric, electronic, hydrophobic) are likely to be important. Of course, reaction of the carcinogen with DNA is far from being the last step in the carcinogenesis problem; there is the potential for the organism to repair the DNA damage to be taken into account. Moreover, the “promotion” by other chemicals (in which preneoplastic cells are expressed into individual tumor cells) and “progression” (involving progress to malignancy by hystopathologic criteria) have to be considered. Thus, many steps are likely to occur, with different weights for the hydrophobic, electronic, and steric factors.
For a final QSAR model of some value to emerge, one must assume that only one step is rate limiting for all the congeners under consideration, or that the different weights of the physicochemical factors are linearly combined. It must be assumed also that all congeners are acting via the same mechanism. At present, this is largely unknowable and in the QSAR practice is usually dealt with by discarding very poorly fit congeners. Despite all these expected difficulties, a large number of successful QSAR analyses of chemical mutagens and carcinogens have been generated, including skin carcinogenicity induced by PAHs. In this case, the QSAR model was based on (1) hydrophobicity (as probe for induction of P-450); (2) a structural parameter coding for the presence of substituents in the K and L regions (which affect metabolism); and (3) the energy of the highest occupied molecular orbital, rationalized in terms of the first oxidation step in P-450 activation yielding an epoxide. Overall, the QSAR model points to the metabolic transformation as the rate-limiting step that rules the dynamics of the whole phenomenon; from its quantification, it is possible to predict the final toxicological effect. 24
The above findings are confirmed, for example, in the case of aromatic amines. Also, here the QSAR analysis was able to demonstrate that metabolic activation is the rate-limiting step, and that modeling this first step alone permits the prediction of the mutagenic and carcinogenic activity of the aromatic amines with good accuracy.25,26
The overall lesson (coming from thousands of successful QSAR equations) is that fortunately the above assumptions are often realistic, and that QSAR analysis is able to identify and model the rate-limiting step(s) of biological mechanisms, thus providing models that are at the same time mechanistically valid and practically suitable for making predictions of the apical endpoint.
Intermediate KEs and ToxCast assays
All toxicological endpoints treated above are characterized by AOPs with relatively well-defined MIEs. It appears that models, including explicitly the MIEs, have a good correlation with or predictivity of the apical endpoint.
It is of interest to study cases where the MIEs are less characterized, but assays for putative intermediate events are available. Such an opportunity is provided by the ToxCast experimentation. ToxCast™ Phases I and II are testing a combined total of about 2000 chemicals, with around 900 assays. 27 The ToxCast HTS assays are mostly related to phenomena such as cell-to-cell interactions and signaling, and not to, for example, covalent interactions with DNA or proteins: in this sense, they largely code for putative intermediate events in toxicological pathways.
In a study performed in our laboratory, 28 the ToxCast Phases I and II results were compared with those of a large series of established toxicological endpoint tests. The HTS assays did not correlate with in vitro and in vivo mutagenicity, or with rodent carcinogenicity. In addition, the HTS assays had relatively low correlations with acute, repeated dose, skin sensitization, and reproductive/developmental toxicity test results (average correlation coefficient = 0.36), with no clear relationships to specific mechanistic patterns. Instead, the correlations with endocrine disruption endpoints were higher (average correlation coefficient = 0.50) and specifically related to estrogen/androgen receptors, thus pointing to mechanistically based relationships. The above result is in agreement with the study discussed in Ref. 12 The limited predictive ability of ToxCast assays for most apical endpoints has been pointed out in a previous study as well. 29
Several technical difficulties of the ToxCast assays (e.g., poor ability to perform in vitro metabolic activation) are yet to be solved, and improvements are in progress. 30 Thus, at this stage, only the overall pattern of results—and not the details—is important. These seem to agree with indications from the skin sensitization analysis reported above, in particular the evidence that intermediate KEs are difficult to be mimicked and/or their models (in vitro tests) give a relatively minor contribution to the final predictivity. It should be emphasized that the in vitro tests are conceived/used in isolation, whereas in vivo mechanisms are a continuum, with feedbacks. In such conditions, in vitro assays for intermediate events may be a poor representation of the reality. Another factor to be considered is that some systemic toxicity endpoints (e.g., repeated dose, reproductive toxicity) include disparate effects, and are not mechanistically well defined such as carcinogenicity or skin sensitization; thus, it is more difficult to identify KEs of general relevance.
Conclusions
The original contribution of the AOP concept to predictive toxicology is that using a range of assays that cover the main events of the whole toxicity pathway can improve the prediction of the apical endpoints with alternative tests. This would permit more solid, mechanistically based predictions.
A central issue, both scientific and regulatory, is: to what extent the proposed alternative approaches are able to provide correct predictions of the apical endpoints? The apical endpoints constitute the backbone of the present regulations, so novel, alternative approaches must prove to be able to replace satisfactorily the traditional assays, or even to be superior. This is also a scientific issue. Since biology is an experimental science, any proposed predictive approach can be considered as scientifically sound only if it provides correct predictions of the toxicity of chemicals.
In this context, in our laboratory, we have applied quantitative analysis methods to a series of cases where sufficient experimental results are available both for the apical endpoint and the predictive assays. The process of exploratory statistics and mathematical modeling provides new information on biological pathways (e.g., by pointing to the events that contribute most to the final predictivity). In addition, statistics and mathematics provide a framework to build quantitative predictive models with less uncertainties than the qualitative applications of, for example, the mode-of-action concept.
The cases analyzed include some of the endpoints with major regulatory importance, for example, skin sensitization, endocrine disruption, carcinogenicity, and repeated doses. From the analyses, some general trends seem to appear.
The skin sensitization AOP framework includes an in chemico test (DPRA) that mimicks the first phase of interaction (interaction with proteins), and a number of in vitro tests that mimick intermediate events (KeratinoSens, hClat). The analysis of data shows that protein interaction is the rate-limiting step, and that DPRA alone permits the prediction of 80% of skin sensitization, whereas the “intermediate events” tests add each 5%–10% to the prediction of the in vivo results.
Regarding endocrine disruption, the ToxCast program has implemented 18 in vitro HTS assays that probe the ER pathway in mammalian systems. 12 The suite of HTS assays measures the MIE (i.e., receptor binding), as well as other KEs (e.g., receptor dimerization, DNA binding, transactivation, gene expression, and cell proliferation), in an AOP. Our analysis shows that the results of in vivo assays (in particular, the uterotrophic and fish21 assays) were correlated mostly with the MIE information (receptor binding capacity and receptor dimerization). It was also possible to build predictive models for the uterotrophic and 21-day fish assays using a limited selection of ToxCast assays, related specifically to the MIE.
In an analysis of the carcinogenicity endpoint, the integration of models for two critical events (DNA reactivity and disruption of tissue microarchitecture) permitted the correct identification of almost all recognized human carcinogens, and of 90–95% of rodent carcinogens. 28
Additional evidence comes from the QSAR field. Here the importance of the rate-limiting step (most often MIE) as a key tool for successful predictions emerges with clarity, and several apical endpoints can be predicted with models that include explicitly parameters for the MIE.
On the contrary, the ToxCast Phases I and II experimentation provides important information on the problems related to modeling the intermediate KEs. The suite of ToxCast HTS is mostly related to phenomena such as cell-to-cell interactions and signaling, and not to, for example, covalent interactions with DNA or proteins: in this sense, they mainly code for putative intermediate events in toxicological pathways. The results from the ToxCast analysis point to a poor correlation of HTS assays with many apical endpoints. Exception to this is the promising association with in vitro endocrine disruption markers.
However, the evidence that complex endpoints can be modeled/predicted with simplified models is not a novelty. It agrees with general experience of modeling: in different fields such as ecology, systems biology, and macroeconomics, grossly simplified models capture important features of the behavior of incredibly complex interacting systems, and permit successful predictions. 31 More specifically, since MIEs are in general amenable to QSAR analysis,32,33 it can be anticipated that the integration of a few in vitro assays with QSAR models is going to provide rapid and inexpensive approaches for the detection of toxins for many endpoints.
On the contrary, the types of toxicological pathways are very different in nature, so it is not possible to apply a common simplified linear chain framework to their study. Empirical analysis of data is the only guide in the evolution of our understanding of how chemicals affect living systems. The formulation of biological hypotheses is aimed at identifying KEs, and devise ad hoc in vitro tests. However, the biological narration, even if a challenging step, is only a starting point. A further crucial passage is when the entire set of results is analyzed with rigorous data analysis methods to (1) describe the type and nature of the pathway; (2) ascertain the presence of a solid correlation with the gold standards represented by the apical toxicological endpoints; and (3) to build models that integrate the various types of evidence.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
Appendix A1: Data Availability
All data used in the various analyses described in this article are freely available. The sources are the following.
The data use in the skin sensitization analysis can be retrieved from the Supplementary Material to Urbisch et al. 6
Regarding the endocrine disruptors, the curated database of rodent uterotrophic bioactivity is in the Supplementary Material to Kleinstreuer et al. 17 The ToxCast high-throughput screening (HTS) results are provided by Browne et al. 12 A compilation of in vivo data, together with the principal component analysis models of the HTS, are in the Supplementary Material to Benigni et al. 13
Regarding the carcinogenicity predictive model, data are reported in Benigni et al. 21
HTS data for ToxCast Phases I and II were retrieved from the file: ToxCast ResultMatrix E1K AC50 level8 2013 12 10.txt, at http://epa.gov/ncct/toxcast/dataarchive.html
