A Critical Review of Adverse Outcome Pathway-Based Concepts and Tools for Integrating Information from Nonanimal Testing Methods: The Case of Skin Sensitization

Abstract

Integrating information from in vitro, in silico, and in chemico methods into toxicity testing strategies has been widely considered the way of phasing out animal testing. At the same time, testing strategies using new approaches and methods shall provide adequate and relevant information about chemicals' hazardous properties. We reviewed objectives and requirements for guiding the process of data integration that are suggested in the scientific literature. Based on the existing approaches, we develop criteria for resource-efficient testing strategies, and we evaluate existing testing strategies for skin sensitization hazard and risk assessment under these criteria. We conclude that existing testing strategies—except two cases—still focus predominantly on maximizing toxicity information, but largely ignore resource efficiency criteria. Balancing information gained from testing strategies with its respective direct and indirect costs (including also welfare losses for society in case of unintended health or environmental damages) is a necessary condition to allow for transparent comparisons of their resource efficiency. Therefore, developing approaches for balancing information gains and costs should become an explicit part of the developmental process of nonanimal testing strategies to ensure that phasing out animal testing complies not only with regulatory information requirements but also with available resources.

Introduction

Nonanimal testing strategies aim at a reduction of the number of animal tests used and, ultimately, at a full replacement of animal testing.^1–4 At the same time, they seek to improve human hazard characterization by mechanism-based toxicological information.⁵ Nonanimal testing strategies have been more generally termed “new approach methods” (NAMs).⁶ In addition to developing individual methods such as in vitro, in silico, and in chemico methods, attention has been given to combining NAMs into toxicity testing strategies.⁷ Basically, a testing strategy is a combination of different testing methods (ideally an NAM) for achieving an adequate assessment of the hazardous properties of substances addressing human health endpoints^8–11 as well as environmental endpoints.¹² Different conceptual approaches for combining information from NAMs into testing strategies have been proposed in the scientific toxicological literature. The suggested approaches have been termed “Integrated Approach to Testing and Assessment” (IATA), “Weight of Evidence” (WoE), “Defined Approach” (DA), “Integrated Testing Strategy” (ITS), and “Sequential Testing Strategy” (STS).¹¹ Still, existing definitions of these approaches are terminologically overlapping and hamper a clear delineation of the corresponding concepts for constructing testing strategies. In general, IATAs have been described as “structured approaches that integrate and weigh different types of data for the purposes of performing hazard identification (…), hazard characterization (…) and/or safety assessment”.⁷ Even the construction of read-across and grouping arguments may require new experimental data generated by a testing strategy, which is guided by the respective grouping concept.^13–15 This has been utilized for the hazard assessment of nanomaterials.^16–18 An IATA can include one or several DAs, denoting “fixed data interpretation procedures (DIP) used to interpret data generated with a defined set of information sources.”¹⁹ Following Sauer et al.,¹¹ a DA can consist of a set of methods and predefined prediction models in the form of ITSs or STSs. The WoE concept has been closely related to the concept of ITS, with the difference that “in WoE there is no formal integration, usually no strategy, and often no testing,”²⁰ whereas the concept of ITS “enables an integrated and systematic approach to guide testing such that the sequence […] is tailored to the chemical-specific situation, adapted and optimized for meeting specific information targets.”^21,22 A summary of recent definitions regarding IATA, DA, ITS, STS, and WoE, as suggested in a recent OECD guidance document,¹⁹ is given in Table 1. In this article the term “testing strategy” will be used to encompass all of these terms.

Table 1.

Existing Concepts for Integrating Information from Nonanimal Approaches in the Context of Hazard and Risk Assessment of Chemicals

Acronym	Definition
IATA	“An Integrated Approach to Testing and Assessment is an approach based on multiple information sources used for the hazard identification, hazard characterization and/or safety assessment of chemicals. An IATA integrates and weights all relevant existing evidence and guides the targeted generation of new data, where required, to inform regulatory decision-making regarding potential hazard and/or risk.”
WoE	“A Weight of Evidence determination means that expert judgement is applied on an ad hoc basis to the available and scientifically justified information bearing on the determination of hazard or risk.”
DA	“A Defined Approach to testing and assessment consists of a fixed data interpretation procedure used to interpret data generated with a defined set of information sources, that can either be used on its own, or together with other information sources within an IATA.”
DA	“Defined Approaches to testing and assessment can be designed in different ways, and may take for example the form of a Sequential Testing Strategy (STS) or an Integrated Testing Strategy (ITS).”
ITS	“An Integrated Testing Strategy is an approach in which multiple sources of data or information are assessed at the same time by applying a variety of specific methodologies to convert inputs from the different information sources into a prediction.”
STS	“A Sequential Testing Strategy is a fixed stepwise approach for obtaining and assessing test data, involving interim decision steps, which, depending on the test results obtained, can be used on their own to make a prediction or to decide on the need to progress to subsequent steps. At each step, information from a single source/method is typically used by applying a prediction model associated with that source/method.”

Source: OECD (2016).¹⁹

The development of NAMs and testing strategies has been driven by multiple objectives. Besides replacing or reducing animal testing, a key purpose has been the need to acquire sufficient and relevant information about chemicals' hazardous properties with less time and at lower costs than traditional animal tests.^20,23–25 For specific human endpoints such as skin sensitization, available mechanistic information about the pathway from a molecular initiating event (MIE) and the sequence of key events that ultimately cause an adverse effect to occur—the adverse outcome pathway (AOP)^26,27—have been considered a guiding principle and component for the construction of testing strategies.^19,28–30

Clearly, these multiple objectives are not necessarily complementary. Maximizing the information outcomes from testing strategies, decreasing the time and the number of test substances needed to attain hazard or risk information, reducing costs of testing, and minimizing or even avoiding the use of animals are likely to be competing objectives.³¹ A crucial challenge for the development of testing strategies is, therefore, to define criteria that allow balancing conflicting objectives to ensure that the outcome of an IATA, that is, information for hazard and/or risk assessment purposes, is generated in an efficient manner.

Resource efficiency—the use of resources to either maximize information outcomes at a given resource endowment or to minimize costs for achieving a given outcome target—is a fundamental economic paradigm.³² Although criteria and conceptual requirements for developing resource-efficient testing strategies have been proposed in the toxicological literature,^20,21,25,33 a comprehensive review of these criteria, and an evaluation of existing testing strategies under these criteria, has not been provided.

The aim of this article is, therefore, to review criteria proposed in the scientific toxicological literature for constructing testing strategies to be used for hazard and risk assessment of chemicals. Next, we propose conceptual and informational criteria that can guide resource-efficient data integration. We then evaluate existing testing strategies for assessing skin sensitization potential (i.e., hazard identification) and potency (i.e., subcategorization into weak, moderate, strong, and extreme sensitizers) with respect to these criteria.

Methods: Survey of the Scientific Literature and Selection of Studies

For identifying requirements and conceptual criteria that are considered relevant for combining different types of information into testing strategies we conducted a systematic survey of the toxicological literature published between 2000 and 2017. This period was chosen because it covers a time frame with course setting developments for chemicals' risk management, for example, the development and enforcement of the European REACH legislation,³⁴ or the publication of the path-breaking report of the U.S. National Research Council (NRC) “Toxicity Testing in the 21st Century: A Vision and a Strategy.”^33,35

The aim of the survey was to select scientific studies that address and discuss general objectives, requirements, and criteria for constructing testing strategies. Given the different terminologies used for describing the overall process of combining different types of information into testing strategies, and to ensure that the literature analysis captured a sufficiently broad spectrum of studies, the search was based on individual and composed search terms as shown in Figure 1.

FIG. 1.

Keywords and composed search terms for identifying scientific studies addressing general objectives, requirements, and criteria integrating information into testing strategies.

The search was conducted by means of two online literature databases, Scopus (https://www-scopus-com-s.web.bisu.edu.cn/search/form.uri?zone=TopNavBar&origin=recordpage) and PubMed (https://www.ncbi.nlm.nih.gov/pubmed?otool=inlwurlib). Search terms were identified from the title, the abstract, and the keywords of a study. Since the focus of the analysis was on identifying general, that is, endpoint-independent, criteria and requirements, we did not consider studies that referred to specific endpoints, endpoint categories, or substance groups (acute toxicity, skin irritation and sensitization, reproductive toxicity, repeated dose toxicity, etc.). Furthermore, policy reports or guideline documents, and studies that did not explicitly discuss criteria and requirements for integrating information, were not included into the analysis.

Results

Objectives, criteria, and requirements for integrating information into testing strategies

The individual and composed term-based literature search revealed a total of 47 studies, 15 of which discussed objectives and criteria or requirements for integrating information into testing strategies and were thus considered sufficiently informative for the evaluation (Table 2). For the 15 studies we also checked the titles, keywords, and abstracts of the quoted references for the search terms. The search term “defined approach” could not be identified in the title, abstract, or keywords of publications in the Scopus and the PubMed databases. Consequently, this search term did not retrieve results.

Table 2.

Scientific Studies Addressing Objectives of, and Criteria/Requirements for Integrating Information into Testing Strategies for Hazard and Risk Assessment of Chemicals

Authors (year of publication)	Title of the study	Source	Objectives of integrating information into testing strategies/DAs	Criteria and requirements for integrating information into testing strategies
SEARCH TERM “TESTING STRATEGY”
Jaworska et al. (2010)²¹	Towards optimization of chemical testing under REACH: A Bayesian Network approach	Regulatory Toxicology and Pharmacology 57, 157–167.	Development of a structured, transparent, hypothesis-driven approach to testing	Use of a probabilistic inference framework (e.g., BNs)
			Use of all existing data	Minimization of uncertainty regarding the final outcome
			Ability to revise inference when new information has become available
			Handling conflicting evidence in a consistent way
			Reason in a consistent manner
Jaworska and Hoffmann (2010)²²	Integrated Testing Strategy (ITS)—Opportunities to better use existing data and guide future testing in toxicology	ALTEX 27, 231–242	Provide a transparent, structured, consistent, and causal methodological approach to testing, as postulated by the concept of evidence-based toxicology	Maximization of evidence
				Minimization of uncertainty regarding the final outcome
			Synthesize information in a cumulative manner such that information gains are maximized	Use of probabilistic methods for reasoning to ensure objectivity of knowledge presentation, handling conflicting data, and incomplete information
				Use of optimization parameters such as predictive performance, uncertainty reduction, costs, feasibility, and animal welfare
				Final decision is based on a WoE approach
Combes and Balls (2011)³⁶	Integrated testing strategies for toxicity employing new and existing technologies	Altern Lab Anim 39, 213–225	Reduction of testing costs and testing time	Combination of exposure information with a set of nonanimal tools that provide a mechanistic understanding of effects
Combes and Balls (2011)³⁶		Altern Lab Anim 39, 213–225	Minimization/avoidance of animal use
De Wever et al. (2012)^39,a	Implementation challenges for designing integrated in vitro testing strategies (ITS) aiming at reducing and replacing animal experimentation	Toxicology In Vitro 26, 526–534	Allow for appropriate decision-making	Formulation of a clear goal of the assessment (e.g., classification and labeling)
			Generate a consistent, transparent and hypothesis-driven approach to testing that follows the principles of the 3Rs (reduction, refinement, replacement of animal testing)	Use of exposure information, and of data gathering, sharing and read-across methods
				Use of a battery of toxicity tests
				Assure applicability of tests across laboratories
				Flexibility, that is, ability to adjust the ITS to different assessment levels
Hartung et al. (2013)²⁰	Food for thought: Integrated testing strategies for safety assessments	ALTEX 30(1), 3–18	Tool that combines different types of information for safety assessments of chemicals	Flexibility regarding the combination of different ITS components
				Optimal combination of ITS components (e.g., through minimizing the number of components, use of those components that have maximum of predictive capacity)
				Characterization of the applicability domain of the entire ITS and of its components
				Efficiency of the assessment in terms of costs, time, and technical requirements
Vermeire et al. (2013)³⁸	OSIRIS, a quest for proof of principle for integrated testing strategies of chemicals for four human health endpoints	Regulatory Toxicology and Pharmacology 67(2), 136–145	Integration of human data and data from in vivo, in vitro, and nontesting methods, physicochemical properties, and exposure studies as efficiently as possible to reach a satisfactory conclusion on the safe use of chemicals	Gathering of all substance-specific information
				Weighing of different types of information using statistical methods and/or expert knowledge
				Determination of the validity and adequacy of the methods used
				If possible, use of exposure-based waiving or threshold-of-toxicological concern methods
				Use of animal test as a last resort
Rovida et al. (2015)^25,a	Workshop report integrated testing strategies for safety assessment	ALTEX 32(1), 25–40	Tool to efficiently combine different information sources in a quantifiable manner to satisfy an information need, for example, a regulatory safety assessment	Information target identification
				Systematic exploration of knowledge
				Choice of relevant inputs
				Methodology to synthesize disparate evidence and to guide follow-up testing
			Combination of different “building blocks” (i.e., nontest and test methods), with the final decision about the safety assessment of a substance should be based on information from more than one type of source	Results must have quantifiable confidence levels
				ITS should allow for balancing the applicability domain of the tests, sufficient information, costs, and experimental feasibility
				Flexibility regarding the selection of information types (e.g., tests)
				Adaptivity, that is, possibility to omit or add new tests or methods
				Reasoning for test selection must be transparent and objective
SEARCH TERM “INTEGRATED APPROACHES TESTING AND ASSESSMENT”
Tollefsen et al. (2014)^7,a	Applying adverse outcome pathways (AOPs) to support integrated approaches to testing and assessment (IATA)	Regulatory Toxicology and Pharmacology 70, 629–640	Gather and weigh all available information (from testing and nontesting approaches) to derive at a conclusion on a hazard or a risk of a chemical	Use nontesting approaches as building blocks
			Guide the generation of information about a chemical with a hypothesis-driven approach	Use AOPs to make predictions about molecular initiating or key events
			Allow for a more mechanistically based risk assessment process	Define confidence factors for in silico methods
			Shift toward more mechanistically based alternative approaches	Define a set of data integration strategies (e.g., Boolean approaches, scoring approaches, decision trees, deterministic and probabilistic approaches, and prediction-based machine learning approaches)
			Provide insights into the biological relevance, reliability, and uncertainty associated with the results from in silico, in chemico, and in vitro approaches	Use a set of evaluation and documentation criteria (e.g., define the endpoint of concern of the IATA, define the purpose for which the IATA is proposed, describe the rationale and the mechanistic basis, describe the individual information sources, and characterize the predictive performance of the entire IATA and its subcomponents)
Worth and Patlewicz (2016)^10,a	Integrated approaches to testing and assessment	Advances in Experimental Medicine and Biology 856, Chapter 13, 317–342	Use of different methodologies and data in a scientifically transparent and sound manner for the purpose of priority setting, hazard identification/profiling, risk assessment, hazard classification, and labeling, PBT, and vPvB assessment	Use of battery and tiered approaches, or of a combination thereof
				Combine components in an IATA on a rational basis, using, for example, using machine learning approaches
				Use of optimization criteria, for example, ability to generate reliable and relevant results, costs, time, regulatory acceptance
COMPOSED SEARCH TERM “TESTING STRATEGY” AND “PARADIGM SHIFT”
Settivari et al. (2015)^40,a	Predicting the future: opportunities and challenges for the chemical industry to apply 21st-century toxicity testing	Journal of the American Association for Laboratory Animal Science 54(2), 214–223	Allow for a mechanistic-based risk assessment in a time- and resource-efficient manner	Prediction of hazard using read across
			Combine a series of computational, biochemical, and in vitro tools to predict perturbations in key events leading to an adverse outcome	Incorporate read across into ITSs
			Reduction of animal use in comparison with standalone guideline tests	Assess the uncertainty of read across by means of probabilistic methods (e.g., BNs)
			Decrease of testing time and costs
COMPOSED SEARCH TERM “TOXICITY TESTING” AND “21ST CENTURY”
Krewski et al. (2010)³³	Toxicity testing in the 21st century: A vision and a strategy	Journal of Toxicology and Environmental Health Part B 13, 51–138	Transform toxicity testing from a system based on high-dose testing in laboratory animals to one using primarily nonanimal methods	Develop a more robust scientific basis of risk assessment by providing detailed mechanistic and dosimetry information and by encouraging the integration of toxicological and population-based data
			Become able to test large number of existing and new chemicals
			Evaluate potential adverse effects with respect to all critical endpoints and life stages
			Evaluate potential toxicity in the most vulnerable members of the human population
			Reduce the cost and time of testing, increase efficiency and flexibility, and make it possible to reach a decision more quickly.
			Minimize animal use; cause minimal suffering to animals that are used
			Acquire detailed mechanistic and tissue-dosimetry data needed to assess human risk quantitatively and to aid in regulatory decision-making
			Provide broader coverage of chemicals and their mixtures, endpoints, and life-stage vulnerabilities
Locke et al. (2017)⁷⁷	Implementing toxicity testing in the 21st century: challenges and opportunities	International Journal of Risk Assessment and Management 20 (1/2/3), 199–225	Change risk assessment of chemicals from a chemical-single outcome paradigm to a multichemical, multifaceted analysis	Devote resources to examine toxicity pathways to enable the development of relevant and reliable tests that describe these pathways
			Implement a system of integrated, smaller step nonanimal tests that evaluate molecular and cellular changes	International harmonization of testing methods and cross-border acceptance of methodologies
			Allow for cost-effective testing
Thomas et al. (2013)³⁷	Incorporating new technologies into toxicity testing and risk assessment: Moving from 21st century vision to a data-driven framework	Toxicological Sciences 136(1), 8–18	Transition toxicity testing and risk assessment from an outdated, inefficient, costly, and animal-centric process to one that is more efficient, economical, less animal intensive, and more relevant to human health by utilizing new technologies that provide a better understanding of the underlying biological system	Use of high-throughput in vitro assays to separate chemicals into selective and nonselective modes of action
Thomas et al. (2013)³⁷		Toxicological Sciences 136(1), 8–18		Use a data-driven framework that invokes successive tiers of testing with the calculation of MOE
COMPOSED SEARCH TERM “TESTING STRATEGY” AND “NON-ANIMAL APPROACHES”
Kopp-Schneider et al. (2013)⁴²	Design of a testing strategy using non-animal-based test methods: Lessons learnt from the ACuteTox project	Toxicology in Vitro 27, 1395–1401	Incorporate information from different sources to obtain a prediction about a characteristic of the compound under study.	Evaluate the quality of assays, and their appropriateness to be incorporated in a testing strategy, by means of statistical methods
			Use simultaneously information on all endpoints
			Reduction of testing costs
COMPOSED SEARCH TERM “TESTING STRATEGY” AND “ADVERSE OUTCOME PATHWAY”
Perkins et al. (2015)³⁰	Adverse outcome pathways for regulatory applications: examination of four case studies with different degrees of completeness and scientific confidence	Toxicological Sciences 148(1), 14–25	Use of “high confidence” AOPs for classification and labeling, chemical prioritization, hazard, and risk assessment of chemicals and in IATAs	Documentation of the scientific confidence in, and uncertainties of the events and pathways in a semiquantitative and quantitative manner (e.g., WoE, multicriteria analysis, Bayesian modeling)
				Consider not only high-confidence AOPs but also incomplete AOPs with high confidence between specific events in specific applications requiring prediction of chemical effects using high-throughput assays or for predicting an outcome based on an early KE that is easily assayable
				Account for multiple interacting pathways or networks of AOPs

The study was also identified under other composed search terms.

KE, key event; MOE, margin of exposure.

As already mentioned in the introduction, most studies pointed to multiple objectives underlying to the development of testing strategies. These consist of both outcome-based (e.g., reduction of animals and costs and high protection of human health), conceptual (e.g., combination of different pieces of information and tools), and procedural objectives (e.g., cost efficiency of the assessment, generate adequate information at low costs, and with a minimum or no animal use, prioritization of testing). Regarding the criteria and requirements that were considered important for the development of testing strategies, we found a broad range of aspects that can be grouped into (i) criteria related to generating information (e.g., about a hazard or a risk) and (ii) criteria guiding the conceptual process of combining different types of information and data integration.

Criteria for generating information about chemical hazards or risks by means of a testing strategy

Several studies emphasized that exploiting all available information from different sources, including testing (e.g., in vitro and high throughput in vitro methods) and nontesting approaches (e.g., in chemico methods and in silico methods), is a key requirement for constructing testing strategies.^{7,20,21,25,36,37} Although there seems to be general agreement that the use of animal tests should be minimized, some studies still suggest in vivo tests to be included into a testing strategy “as a last resort.”³⁸ A testing strategy should start with carefully evaluating all existing information, for example, from in vivo or in vitro approaches and existing exposure information, to decide whether remaining data gaps need to be filled with additional testing, how data gaps can be filled most efficiently, or whether exposure-based waiving approaches can be applied.^36,38,39 Furthermore, it was pointed out that the uncertainty of outcomes from individual building blocks in a testing strategy, and of the conclusion that is ultimately adopted, should be assessed and minimized.^21,33,40

Criteria for guiding the conceptual process of information and data integration in a testing strategy

Corresponding to the use and combination of different pieces of information, several authors underlined that testing strategies should remain flexible regarding the selection of methods and the order of steps in the assessment.^20,25,39 In general, the integration of information was characterized as a dynamic process that progresses along with the development of testing methods in combination with exposure information, and the exploitation of mechanistic information.^7,41 Some studies explicitly pointed to the potential of AOPs to prioritize and guide testing, but also to the use of AOPs as prediction tools (e.g., of MIEs) within testing strategies.^7,30 However, the confidence in the events of an AOP, and of the AOP as a whole, should be documented by means of semiquantitative and quantitative methods.³⁰ In addition, it was repeatedly acknowledged that the development of testing strategies requires criteria and tools addressing how the combination of information from different sources can be done in an “optimal” way. Several studies pointed to the urgent need to structure testing strategies more efficiently.^20,21,25,33 This also included the recommendation to use quantitative performance parameters and statistical methods for evaluating data quality and the (inter- and intralaboratory) variability of the data,^25,42 but also information on testing costs, time, and regulatory acceptance.¹⁰ In this context, the terms “efficiency,” “resource-efficiency,” or “cost-effectiveness” were frequently used to characterize the process of balancing different types of information toward an overall result of the strategy that is considered reliable, robust, and relevant.³³

Defining “resource-efficient” testing

The studies listed in Table 2 emphasize the need to combine different types of data about a chemical's properties with information about the resource use for generating these data. Also in the earlier literature on chemicals' testing and safety evaluation, “resource-efficiency”⁴³ or “cost-effectiveness”^44,45 were frequently proposed as targets of a modern approach to toxicity testing and safety evaluation. However, the meaning of these terms in the context of the development of testing strategies has not been concretized. The term “efficiency” or “resource efficiency” is a key economic decision criterion for guiding the allocation of scarce resources. In the economics literature, “resource efficiency” denotes an allocation of resources that allows achieving a given outcome target with a minimum of resources.³² Assuming that the ultimate goal of toxicity testing is to allow for adopting better-informed decisions upon chemicals use, toxicological testing requires a variety of resources, in particular appropriate laboratory equipment or computational capacities, manpower, laboratory animals, and time. Clearly, resource use depends on the toxicological effect of interest (the so-called endpoint). The challenge is, therefore, to distribute available resources such that a maximum of output—e.g. a hazard information—can be achieved, or, to use a dual formulation, that a certain information outcome can be achieved with a minimum of resources.⁴⁶ Thus, efficient or optimal testing can be characterized as a process wherein a maximum of information is achieved at the lowest cost. In the context of chemicals testing, we propose to base efficiency evaluations on the following operational criteria:

Efficiency evaluations of individual testing methods and testing strategies require, first, to specify information gains and costs. Then we need to select appropriate quantifiable parameters for both components. Information gains can be quantified by different metrics. For example, a testing method's information outcome can be characterized in terms of its predictive accuracy, describing “the closeness of agreement between test method results and accepted reference values.”⁸⁵ Common metrics, in the simple case of a dichotomized outcome, are sensitivity (being the proportion of hazardous substances correctly classified as hazardous by a testing method) and specificity (being the proportion of nonhazardous substances correctly classified as nonhazardous by the testing method). In addition, information gains from testing can be characterized by a testing method's reliability, denoting its ability to reproduce within and between laboratories over time and usually expressed in terms of a testing method's intra- and inter-laboratory reproducibility. Finally, information about the coverage of specific key events in the AOP of a particular in vivo adverse outcome by a specific testing method is important in order to quantify the informational gains from testing.

Testing costs can be distinguished into direct and indirect costs. Direct costs consist of (i) laboratory equipment or computational capacities for conducting a testing method, (ii) laboratory animal welfare loss (in case of an animal test), and (iii) testing time. Indirect testing costs include, for example, expenditures, resources, and time needed for the validation of a (nonanimal) testing method, or switching costs in cases wherein new technologies have to be adopted.⁴⁶

The evaluation of resource efficiency can be based on five key criteria. First, the purpose of the assessment, for example a classification or labeling, a hazard or a risk prediction, should be spelled out. Second, efficiency evaluations require defining a mechanism to balance information gains from testing and the costs of this test. This, third, requires specifying how information gains and costs are valued. Basically, two possibilities exist: a monetary and a nonmonetary valuation. In case of a nonmonetary valuation, information or cost parameters are expressed in terms of their natural units (e.g., the proportion of positive chemicals correctly classified or the number of animals used in a test). A monetary assessment requires transferring information or cost parameters into monetary (e.g., Dollar or Euro) values. Although direct costs, that is, expenditures for conducting a test, are usually expressed in monetary terms, a monetary valuation of other cost components (e.g., animal welfare loss) is less common and often not wanted due to, for example, ethical concerns.⁴⁷ Likewise, a monetization of test outcomes, in particular the expected gains and losses arising from decisions that are based on these outcomes (e.g., health and environmental benefits and costs from a continued use of a substance), is often not straightforward due to the absence of market-based values.

Fourth, evaluating the resource efficiency of testing strategies must account for different types of uncertainty underlying to information gains and costs. Any testing method, including the animal test, has a limited precision and accuracy, since it is merely a model representation of human or environmental toxicological endpoint. Hence the information outcomes from these methods are uncertain. Ideally, if different (nonanimal) testing methods are combined into a testing strategy, uncertainty will be reduced throughout the strategy. Again, different options exist for assessing the uncertainty of test information and costs. A general distinction can be made between frequentist and Bayesian approaches. Frequentist approaches, for example the calculation of confidence intervals of statistical measures, assess the variability of testing outcomes due to the variability of input data. Applying frequentist approaches requires the underlying data sets to be of a sufficient size. Bayesian inference methods, to the contrary, adopt a more comprehensive concept of uncertainty by explicitly accounting for a decision-maker's subjective beliefs, for example about a substance's properties. This provides a means for updating information if new data become available. Finally, given that nonanimal testing strategies combine individual nonanimal testing methods, integrating data from individual testing methods requires determining a stopping rule for testing.^24,48 Ideally, this stopping rule should be endogenous, that is, it should result from the process of combining different types of information. An endogenous stopping rule is conditional on the results from testing following the sequential steps of a testing strategy. This contrasts with exogenous stopping rules, for example information targets that are defined prior to testing.

Evaluating testing strategies addressing skin sensitization according to resource efficiency criteria

Following the identification of general criteria for resource-efficient testing in the previous section, this section offers a detailed evaluation of these criteria for existing testing strategies used for skin sensitization hazard and risk assessment. Skin sensitization is the clinically relevant endpoint for assessing allergic contact dermatitis.^49,50 Approximately 15%–20% of the human population suffer from an acute contact dermatitis (ACD) incident once in their life.⁵¹ Assessing chemicals' ability to cause ACD—their skin sensitization potential (i.e., hazard)—is a key information requirement for the safety assessment of chemicals falling under the European chemicals' legislation REACH³⁴ and the European Cosmetics Regulation.⁵² It is the first complex toxicological endpoint for which an AOP has been defined and is used in regulatory practice.⁵³ Hence, skin sensitization can be used as a case to investigate progress on the challenge of resource-efficient toxicity testing using NAMs.

To date, none of the existing NAMs is considered to provide sufficient information to fully replace the animal tests used for skin sensitization hazard identification and potency assessment.⁵³ Instead, a combination of in vitro, in chemico, and in silico methods has been considered a promising way forward to generate sufficient information and eventually replace in vivo tests.^43,91 During the past years, several testing strategies have been proposed for the assessment of skin sensitization potential and potency.^19,41,54,84

In general, testing strategies for skin sensitization potential and potency assessment use different conceptual and methodological approaches to combine information from the individual NAMs. Hence, they are presented in different ways, for example, in the form of qualitative flowcharts,^8,55–57 quantitative probabilistic approaches (machine learning) applying artificial neural networks (ANNs),^58–60 Bayesian networks (BNs),^21,61,62 as deterministic approaches based on a “majority vote” decision rule for batteries of NAMs,^63–66 or as score-based batteries of NAMs.^67–69 In addition, a regression analysis model⁷⁰ and a quantitative model using toxicokinetics and toxicodynamics modeling⁷¹ have been proposed. Based on the criteria defined in Table 3, Table 4 offers a comparative evaluation of existing testing strategies for assessing skin sensitization hazard and potency suggested in the scientific literature, and in the recent OECD guidance document.¹⁹ The testing strategies, therefore, comply with the definition of IATA proposed in the Guidance document recently published by the OECD.¹⁹

Table 3.

Criteria for Evaluating Resource Efficiency of Testing Strategies for Hazard and Risk Assessment of Chemicals

Informational criteria	Possible assessment parameters
Specification of information gain/outcome of a test or a testing strategy	Accuracy parameters (e.g., sensitivity/specificity) for characterizing the ability of a method to assess hazard/potency classes
	Reliability parameters (e.g., intra- and interlaboratory reproducibility, confidence parameters, precision)
	Mechanistic information (e.g., mode of action and coverage of key events in the AOP)
Specification of costs	Direct testing costs (laboratory equipment or computational capacities, animal welfare loss, testing time, labor costs)
Specification of costs	Indirect testing costs (i.e., validation costs)

Criteria for evaluating efficiency	Possible methods/approaches
Definition of the purpose of the assessment	Classification and labeling, hazard identification, potency subcategorization
Approaches to balance information gains and costs	Qualitative approaches (e.g., multi criteria analysis)
Approaches to balance information gains and costs	Quantitative approaches (e.g., cost-effectiveness analysis and cost–benefit analysis)
Method for the valuation of information gains and costs	Monetary valuation
Method for the valuation of information gains and costs	Nonmonetary valuation
Method for the assessment of uncertainties related to information gains and testing costs	Frequentist statistics' approaches (e.g., calculation of confidence intervals and multicriteria analysis)
	Bayesian inference methods
	Approaches for assessing testing method's precision
Definition of an (endogenous) stopping rule for testing	Decision-theoretic approaches (e.g., machine learning approaches and Value-of-Information [VOI] analysis)
Definition of an (endogenous) stopping rule for testing	Mechanistic relevance-driven approaches (e.g., the AOP)

Table 4.

Evaluation of Testing Strategies for Assessing Skin Sensitization Hazard or Potency According to Informational and Efficiency Criteria

Sources: Own collection of information from the OECD Annex I on the case studies proposed as DAs (OECD, 2016) and from individual publications. “2 out of 3” ITS approach^63,64,85; Kao ITS and Kao STS^68,69; RIVM STS^65,66; Stacking meta model⁸⁰; IDS^81,82; BN ITS^21,61,62; ANN ITS^58–60; EC-JRC^72,73; Global and local regression models^57,70,75; SARA.⁷¹

ACD; ANN, artificial neural network; BNs, Bayesian networks; GPMT, Guinea pig maximization test; LLNA, local lymph node assay; MI, mutual information; MIE, molecular initiating event; n.a, not available; NPV, negative predictive value; PLS; PPV, positive predictive value; RIVM; RMS, root mean square; SVM, Support Vector Machine.

Reliability assessments of individual testing methods used in a strategy, accounting for intra- and inter-laboratory reproducibility of the testing methods, are usually provided. Exemptions are the RIVM STS and the Stacking meta model, for which the variability of individual data sources was not explicitly taken into account.¹⁹

For assessing the reliability of probabilistic testing strategies, different methods are used. In case of deterministic strategies, these included an assessment of individual methods' reproducibility, applicability domain, or predictive accuracy parameters such as positive predictive value and negative predictive value. Within probabilistic approaches, additional quantitative assessments based on, for instance, Bayes factors and regression analyses, are used. Each source of information in a testing strategy (i.e., each individual testing method) addresses a specific key event within the skin sensitization AOP. In general, it is assumed that the first three key events of the AOP (i.e., covalent binding of the electrophilic substance to skin proteins, release of proinflammatory cytokines and induction of cytoprotective cellular pathways, activation and maturation of dendritic cells, and their migration to the local lymph nodes) must be considered in a WoE approach to meet the information requirements of REACH Annex VII and to allow for conclusions on a substance's skin sensitization potential.⁵³ This is the case for most strategies presented in Table 4.^61,62,64,65 However, some testing strategies focus on selected key events only. For example, the Kao DA^68,69 covers the first and third key event, whereas the EC-JRC DA covers the MIE only,^72–74 which is considered to provide the final conclusion on the skin sensitization potential.⁷² The IATA suggested by Patlewicz et al.⁷⁵ and the skin allergy risk assessment (SARA) strategy⁷⁰ cover the fourth key event.

Considering direct and indirect testing costs we observe that only few testing strategies report direct and/or indirect testing cost estimates. These are the “2 out of 3” ITS,⁷⁶ and the RIVM STS.⁶⁵ Studies suggesting probabilistic testing strategies^70,58–60 are assumed to always save costs because unnecessary testing can be avoided. A transparent assessment of the resource use is, however, lacking. Furthermore, the BN ITS⁶¹ and the “2 out of 3” ITS^63,78 suggest that additional testing costs can be avoided if the information collected at a certain step of the strategy is sufficient to conclude on the skin sensitization potential. Again, this is not underpinned by any form of assessment. In few cases, for example, the nontesting pipeline approach,^57,75 the animal test local lymph node assay (LLNA) is proposed as a “last resort.”

Regarding the valuation of information and cost metrics, all strategies selected document the information outcomes in a nonmonetary way. Furthermore, direct or indirect testing costs are reported in only two cases, the “2 out of 3” ITS, and in the RIVM STS. However, these strategies do not explicitly include cost information in the building process of the strategy. In the case of the “2 out of 3” ITS, a follow-up study⁴⁸ applied a decision-theoretic Value-of-Information (VOI) approach. The assessment includes information about direct testing costs and health damage costs related to ACD occurrence into an efficiency assessment of STSs consisting of selected nonanimal prediction methods for skin sensitization assessment.

None of the testing strategies included in the evaluation (Table 4) proposes or uses a mechanism for balancing information gains and costs. Furthermore, they do not incorporate an endogenous stopping rule to testing. Instead, the decision to stop testing is based on exogenous rules, for example, a predefined number and order of testing and nontesting methods or AOP coverage.^65,66,68 Other deterministic testing strategies such as “2 out of 3” ITS apply a majority vote rule suggested for the cases wherein the decision follows the outcome of two concordant test results.^63,64 The RIVM STS⁶⁵ uses as a first step a Bayesian QSAR approach, described by Rorije et al.,⁷⁹ which is followed by tiers of testing methods. The overall conclusion is also based on a majority vote from test results from sequential steps in the strategy. In probabilistic strategies such as the BN ITS,^20,61,62 the stopping rule is exogenously determined by defining information targets, for example, the prediction of the skin sensitization potential using the LLNA results as a reference.

The uncertainty of relevant parameters for information outcomes is assessed in a variety of ways. The BN ITS proposed by Refs,^20,61,62 offers an elaborate uncertainty assessment with regard to predictive accuracy of each individual method, and the precision, being the ability of a method to produce concordant results from repeated testing. Uncertainty is based on Bayesian inference and mutual information theory. In case of deterministic approaches, the majority vote is frequently applied to test results without explicitly assessing uncertainties. Individual testing methods' reproducibility, interchangeability, and reliability are assessed for methods used in, for example, the “2 out of 3” ITS^63,64,85 and the RIVM STS.^65,66 For the “2 out of 3” ITS Leontaridou et al.,⁷⁶ introduced a statistical approach for quantifying precision, being defined as a testing method's or testing strategies ability to correctly classify substances in repeated applications. By determining the pooled standard deviation of a testing method's results, a range, the so-called borderline range, around the classification threshold of a nonanimal testing method's prediction model, is quantified. Within the borderline range, test results are nonconclusive due to a testing method's biological and technical variability, that is, test results falling in this range may not be considered unambiguous.

We observe differences between deterministic and probabilistic approaches regarding the integration of different types of information toward a final decision (i.e., a conclusion on hazard or potency or whether further testing is required): Whereas deterministic strategies use qualitative criteria such as the majority vote rule applied in the “2 out of 3” ITS,^63,64 probabilistic testing strategies use quantitative methods offering endogenous data integration mechanisms. Examples are the ANN ITS^57–59 and the BN ITS.^20,21,61,62 In the Stacking meta model,⁸⁰ the IDS^81,82 and the ANN ITS^58–60,83 machine learning approaches are used, which encompass computational algorithms developed to predict hazardous properties of substances and to reduce uncertainties underlying the assessment of the hazardous properties. They include Support Vector Machines and Classification and Regression Trees.^81,82 The EC-JRC DA uses the Classification Trees machine learning approach based on in silico information to predict skin sensitization potential.^72,73

Using machine learning approaches is considered a suitable approach to optimize testing strategies because they allow for a quantification of uncertainties at any stage of the testing strategy, and they allow for learning (i.e., updating the assessment) if new information (e.g., about the molecular structure of a substance) is received. Still, all testing strategies using machine learning approaches focus on maximizing the information outcomes from testing. Hence, information outcomes (expressed, e.g., in terms of the predictive accuracy metrics) are not balanced with the costs for generating this information.

Conclusions and Recommendations

We reviewed the state-of-the-art of testing strategies regarding the combination of information from different testing and nontesting methods (i.e., NAMs) to determine the hazardous properties of chemicals. Based on a systematic literature search and evaluation we identified objectives, requirements, and criteria that are considered relevant for the development of testing strategies. Our analysis revealed that one of these requirements is resource efficiency. To make this requirement operational, we propose criteria that facilitate the evaluation of resource efficiency of testing strategies. We applied these criteria to existing testing strategies suggested for the assessment of skin sensitization potential (i.e., hazard assessment) and potency (i.e., risk assessment), which were described in the scientific literature and discussed in a recent OECD guidance document.¹⁹

Our findings illustrate that most of the testing strategies were dedicated to maximizing the information outcome of testing. Only the “2 out of 3” ITS and the RIVM STS considered information about direct or indirect testing costs. Still, none of the testing strategies incorporated costs or resource use as a complementary piece of information into the construction of the strategy. As a consequence, a key requirement for efficiency evaluations, i.e. generating information about the gains and costs of the assessment process, is not met. Furthermore, tools that allow balancing information outcomes with the resources required to generate this information are not incorporated into these testing strategies. Consequently, the (information) gains per unit of cost remain unclear, and conclusions on the resource efficiency of testing strategies for assessing skin sensitization are, therefore, not possible.

If resources and time for testing are limited, an optimal allocation of scarce resources is considered highly desirable. Still, developers and users of nonanimal testing strategies (and toxicological testing methods in general) only have started to become aware that optimization includes balancing potentially competing objectives rather than merely maximizing the information gains from testing. There is thus a need for incorporating information and data about resource use (direct and indirect costs, animal welfare, and time) into the construction process of testing strategies.^23,24 This requires developing and testing approaches that allow combining information on resource use with toxicological information. Depending on the specific context at hand, different tools can be used for this purpose. For example, in a recent article by Leontaridou et al.,⁴⁸ a decision-theoretic approach using Bayesian Value-of-Information analysis was applied to guide the optimization of testing strategies. Applying the approach to nonanimal testing methods used for assessing skin sensitization potency (i.e., hazard) demonstrated that testing strategies can be more resource efficient than the animal test LLNA. Furthermore, (sequential) combinations of nonanimal testing methods usually perform better than individual methods, a result that was also confirmed by Roberts and Patlewicz.⁹² In addition, their results indicate that full coverage of all key events in the sensitization AOP is neither a necessary nor a sufficient condition for optimal, resource-efficient testing and safety evaluations. Rather, depending on the available information, it may be preferable to start a sequence with a testing method (or a combination of methods), which refers to a key event occurring “late” in the AOP, or, contrary, to the MIE.

As an alternative to incorporating optimization methods in the construction process of testing strategies, their resource efficiency can be evaluated ex post, that is, after the strategy was developed. This can be done, for example, by applying cost–benefit or cost-effectiveness analysis. Both methods have been widely used for efficiency assessments of medical treatments, wherein the conceptual challenge to identify the best performing alternative is similar to the challenge of toxicity testing.^86–89 Applications to the field of toxicity testing were provided by Nordberg et al.,⁹⁰ Norlen et al.,⁴⁶ or Gabbert and van Ierland.⁴⁷ Despite the rich set of available tools, practical applications are required to understand their applicability, their implications for chemicals risk assessment, and to ensure that phasing out animal testing complies not only with regulatory information requirements but also with available resources. This, in turn, requires to strengthen the inter- and transdisciplinary collaboration of toxicologists and economists.

Footnotes

Author Disclosure Statement

No competing financial interests exist.

References

Adler

, Basketter

, Creton

, et al. Alternative (non-animal) methods for cosmetics testing: Current status and future prospects-2010. Arch Toxicol, 2011:85; 367–485.

Ahlers

, Stock

, Werschkun

. Integrated testing and intelligent assessment—New challenges under REACH. Environ Sci Pollut Res Int, 2008:15:565–572.

Hartung

. Toxicology for the twenty-first century. Nature, 2010:460; 208–212.

Reisinger

, Hoffmann

, Alépée

, et al. Systematic evaluation of non-animal test methods for skin sensitisation safety assessment. Toxicol In Vitro, 2015:29; 259–272.

Tice

, Austin

, Kavlock

, et al. Improving the human hazard characterization of chemicals: A Tox21 update. Environ Health Perspect, 2013:121; 756.

ECHA. New Approach Methodologies in Regulatory Science Proceedings of a Scientific Workshop Helsinki, held on April 19–20, 2016. ECHA-16-R-21-EN. 2016.

Tollefsen

, Scholz

, Cronin

, et al. Applying adverse outcome pathways (AOPs) to support integrated approaches to testing and assessment (IATA). Regul Toxicol Pharmacol, 2014:70; 629–640.

ECHA. The Use of Alternatives to Testing on Animals for the REACH Regulation. Second Report Under Article 117(3) of the REACH Regulation. European Chemicals Agency, Helsinki; 2014.

Gocht

, Berggren

, Ahr

, et al. The SEURAT-1 approach towards animal free human safety assessment. ALTEX, 2015:32; 9–24.

10.

Worth

, Patlewicz

. Integrated approaches to testing and assessment. Adv Exp Med Biol, 2016:856; 317–342.

11.

Sauer

, Hill

, Curren

, et al. Local tolerance testing under REACH: Accepted non-animal methods are not on equal footing with animal tests. Altern Lab Anim, 2016:44:281–299.

12.

Nendza

, Müller

, Wenzel

. Discriminating toxicant classes by mode of action: 4. Baseline and excess toxicity. SAR QSAR Environ Res, 2014:25; 393–405.

13.

Patlewicz

, Ball

, Booth

, et al. Use of category approaches, read-across and (Q) SAR: General considerations. Regul Toxicol Pharmacol, 2013:67; 1–12.

14.

Teubner

, Landsiedel

. Read-across for hazard assessment: The ugly duckling is growing up. Altern Lab Anim, 2015:43; P67–P71.

15.

Casati

. Contact hypersensitivity: Integrated approaches to testing and assessment. Curr Opin Toxicol, 2017:5; 1–5.

16.

Burden

, Aschberger

, Chaudhry

, et al. The 3Rs as a framework to support a 21st century approach for nanosafety assessment. Nano Today, 2016, DOI: 10.1016/j.nantod.2016.06.007.

17.

Landsiedel

, Ma-Hock

, Wiench

, et al. Safety assessment of nanomaterials using an advanced decision-making framework, the DF4nanoGrouping. J Nanopart Res, 2017:19; 171.

18.

Oesch

, Landsiedel

. Genotoxicity investigations on nanomaterials. Arch Toxicol, 2012:86; 985–994.

19.

OECD. Guidance Document on the Reporting of Defined Approaches to Be Used Within Integrated Approaches to Testing and Assessment. Task Force on Hazard Assessment. France: OECD Publishing France; 2016.

20.

Hartung

, Luechtefeld

, Maertens

, et al. Food for thought: Integrated testing strategies for safety assessments ALTEX. 2013:30; 3–18.

21.

Jaworska

, Gabbert

, Aldenberg

. Towards optimization of chemical testing under REACH: A Bayesian network approach to integrated testing strategies. Regul Toxicol Pharmacol, 2010:57; 157–167.

22.

Jaworska

, Hoffmann

. Integrated Testing Strategy (ITS)—Opportunities to better use existing data and guide future testing in toxicology. ALTEX, 2010:27; 231–242.

23.

Gabbert

, Weikard

. A theory of chemicals regulation and testing. Nat Resour Forum, 2010:34; 155–164.

24.

Gabbert

, Weikard

. Sequential testing of chemicals when costs matter: A value of information approach. Hum & Ecol Risk Ass, 2013:19; 1067–1088.

25.

Rovida

, Alépée

, Api

, et al. Integrated testing strategies (ITS) for safety assessment. ALTEX, 2015:32; 25–40.

26.

Benigni

, Bossa

, Tcheremenskaia

. A data-based exploration of the adverse outcome pathway for skin sensitisation points to the necessary requirements for its prediction with alternative methods. Regul Toxicol Pharmacol, 2016:78; 45–52.

27.

Villeneuve

, Garcia-Reyero

, et al. Adverse Outcome Pathway (AOP) development I: Strategies and principles. Toxicol Sci, 2014:142; 312–320.

28.

Burden

, Sewell

, Andersen , et al. Adverse outcome pathways can drive non-animal approaches for safety assessment. J Appl Toxicol, 2015:35; 971–975.

29.

Edwards

, Tan

, Villeneuve

, et al. Adverse outcome pathways—Organizing toxicological information to improve decision making. J Pharmacol Exp Ther, 2016:356; 170–181.

30.

Perkins

, Antzak

, Burgoon

, et al. Adverse outcome pathways for regulatory applications: Examination of four case studies with different degrees of completeness and scientific confidence. Toxicol Sci, 2015:148; 14–25.

31.

Crawford

, Hartung

, Hollert

, et al. Green toxicology: A strategy for sustainable chemical and material development. Environ Sci Eur, 2017:29; 16.

32.

Clemen

, Reilly

. Making Hard Decisions with Decision Tools. Belmont CA: Duxbury Press; 2001.

33.

Krewski

, Acosta

, Anderson

, et al. Toxicity testing in the 21st century: A vision and a strategy. J Toxicol Environ Health B Crit Rev, 2010:13; 51–138.

34.

EC. Regulation (EC) No 1907/2006 of the European Parliament and the Council of 18 December 2006 concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH), establishing a European Chemicals Agency, amending Directive 1999/45/EC and repealing Council Regulation (EEC) No. 793/93 and Commission Regulation (EC) No. 1488/94 as well as Council Directive 76/769/EEC and Commission Directives 91/155/EEC, 93/67/EEC, 93/105/EC and 2000/21/EC; 2006.

35.

Krewski

, Andersen

, Mantus

, et al. Toxicity testing in the 21st century: Implications for human health risk assessment. Risk Anal, 2009:29; 474–479.

36.

Combes

, Balls

. Integrated testing strategies for toxicity employing new and existing technologies. Altern Lab Anim, 2011:39; 213–225.

37.

Thomas

, Philbert

, Auerbach

, et al. Incorporating new technologies into toxicity testing and risk assessment: Moving from 21st century vision to a data-driven framework. Toxicol Sci, 2013:136; 4–18.

38.

Vermeire

, Aldenberg

, Buist

, et al. OSIRIS, a quest for proof of principle for integrated testing strategies of chemicals for four human health endpoints. Regul Toxicol Pharmacol, 2013:67; 136–145.

39.

De Wever

, Fuchs

, Gaca

, et al. Implementation challenges for designing integrated in vitro testing strategies (ITS) aiming at reducing and replacing animal experimentation. Toxicol In Vitro, 2012:26; 526–534.

40.

Settivari

, Ball

, Murphy

, et al. Predicting the future: Opportunities and challenges for the chemical industry to apply 21st-century toxicity testing. J Am Assoc Lab Anim Sci, 2015:54; 214–223.

41.

Jaworska

. Integrated testing strategies for skin sensitization hazard and potency assessment—State of the art and challenges. Cosmetics, 2016:3; 16.

42.

Kopp-Schneider

, Prieto

, Kinsner-Ovaskainen

, et al. Design of a testing strategy using non-animal based test methods: Lessons learnt from the ACuteTox project. Toxicol In Vitro, 2013:27; 1395–1401.

43.

Kinsner-Ovaskainen

, Akkan

, Casati

, et al. Overcoming barriers to validation of non-animal partial replacement methods/integrated testing strategies: The report of an EPAA-ECVAM workshop. Altern Lab Anim, 2009:37; 437–444.

44.

Hoffmann

, Saliner

, Patlewicz

, et al. A feasibility study developing an integrated testing strategy assessing skin irritation potential of chemicals. Toxicol Lett, 2008:180; 9–20.

45.

Lewis

, Kazantzis

, Fishtik

, et al. Integrating process safety with molecular modeling-based risk assessment of chemicals within the REACH regulatory framework: Benefits and future challenges. J Hazard Mater, 2007:142; 592–602.

46.

Norlén

, Worth

, Gabbert

. A tutorial for analysing the cost-effectiveness of alternative methods for assessing chemical toxicity: The case of acute oral toxicity prediction. Altern Lab Anim, 2014:42; 115–127.

47.

Gabbert

, van Ierland

. Cost-effectiveness analysis of chemical testing for decision-support: How to include animal welfare?. Hum Ecol Risk Assess, 2010:16; 603–620.

48.

Leontaridou

, Gabbert

, Van Ierland

, et al. Evaluation of non-animal methods for assessing skin sensitisation hazard: A Bayesian value-of-information analysis. Altern Lab Anim, 2016:44; 255–269.

49.

Macdonald

, Davidson

. Dose and cycle of insecticide applications in the control of malaria. Bull World Health Organ, 1953:9; 785–812.

50.

UNECE. Chapter 3.4 Respiratory or skin sensitization. Globally Harmonized System of Classification and Labelling of Chemicals (GHS), Fourth revised version. United Nations New York and Geneva: UNECE; 2011.

51.

Thyssen

, Linneberg

, Menné

, et al. The epidemiology of contact allergy in the general population—Prevalence and main findings. Contact Dermatitis, 2007:57; 287–299.

52.

EC. Regulation (EC) No. 1223/2009 of the European Parliament and of the Council of 30 November 2009 on Cosmetic Products. 2009.

53.

ECHA. Guidance on information requirements and Chemical Safety Assessment. Chapter R.7a: Endpoint Specific Guidance Draft Version 5.0. 2016.

54.

Clouet

, Kerdine-Römer , Ferret

. Comparison and validation of an in vitro skin sensitization strategy using a data set of 33 chemical references. Toxicol In Vitro, 2017. [Epub ahead of print]; DOI: 10.1016/j.tiv.2017.05.014.

55.

Grindon

, Combes

, Cronin

MTD

, et al. An integrated decision-tree testing strategy for skin sensitisation with respect to the requirements of the EU REACH legislation. Altern Lab Anim, 2008:36(Suppl 1);75–89.

56.

Mekenyan

, Patlewicz

, Dimitrova

, et al. Use of genotoxicity information in the development of integrated testing strategies (ITS) for skin sensitization. Chem Res Toxicol, 2010:23; 1519–1540.

57.

Patlewicz

, Simon

, Rowlands

, et al. Proposing a scientific confidence framework to help support the application of adverse outcome pathways for regulatory purposes. Regul Toxicol Pharmacol, 2015:71; 463–477.

58.

Hirota

, Fukui

, Okamoto

, et al. Evaluation of combinations of in vitro sensitization test descriptors for the artificial neural network-based risk assessment model of skin sensitization. J Appl Toxicol, 2015:35; 1333–1347.

59.

Hirota

, Kouzuki

, Ashikaga

, et al. Artificial neural network analysis of data from multiple in vitro assays for prediction of skin sensitization potency of chemicals. Toxicol In Vitro, 2013:27; 1233–1246.

60.

Tsujita-Inoue

, Hirota

, Ashikaga

, et al. Skin sensitization risk assessment model using artificial neural network analysis of data from multiple in vitro assays. Toxicol In Vitro, 2014:28; 626–639.

61.

Jaworska

, Dancik

, Kern

, et al. Bayesian integrated testing strategy to assess skin sensitization potency: From theory to practice. J Appl Toxicol, 2013:33; 1353–1364.

62.

Jaworska

, Natsch

, Ryan

, et al. Bayesian integrated testing strategy (ITS) for skin sensitization potency assessment: A decision support system for quantitative weight of evidence and adaptive testing strategy. Arch Toxicol, 2015:89; 2355–2383.

63.

Bauch

, Kolle

, Ramirez

, et al. Putting the parts together: Combining in vitro methods to test for skin sensitizing potentials. Regul Toxicol Pharmacol, 2012:63; 489–504.

64.

Urbisch

, Mehling

, Guth

, et al. Assessing skin sensitization hazard in mice and men using non-animal test methods. Regul Toxicol Pharmacol, 2015:71; 337–351.

65.

van der Veen

, Rorije

, Emter

, et al. Evaluating the performance of integrated approaches for hazard identification of skin sensitizing chemicals. Regul Toxicol Pharmacol, 2014:69; 371–379.

66.

van der Veen

, Soeteman-Hernandéz

, Ezendam

, et al. Anchoring molecular mechanisms to the adverse outcome pathway for skin sensitization: Analysis of existing data. Crit Rev Toxicol, 2014:44; 590–599.

67.

Ellison

, Madden

, Judson

, et al. Using in silico tools in a weight of evidence approach to aid toxicological assessment. Mol Inform, 2010:29; 97–110.

68.

Nukada

, Miyazawa

, Kazutoshi

. Data integration of non-animal tests for the development of a test battery to predict the skin sensitizing potential and potency of chemicals. Toxicol In Vitro, 2013:27; 609–618.

69.

Takenouchi

, Fukui

, Okamoto

, et al. Test battery with the human cell line activation test, direct peptide reactivity assay and DEREK based on a 139 chemical data set for predicting skin sensitizing potential and potency of chemicals. J Appl Toxicol, 2015:35; 1318–1332.

70.

Natsch

, Emter

, Gfeller

, et al. Predicting skin sensitizer potency based on in vitro data from KeratinoSens and kinetic peptide binding: Global versus domain-based assessment. Toxicol Sci, 2015:143; 319–332.

71.

MacKay

, Davies

, Summerfield

, et al. From pathways to people: Applying the adverse outcome pathway (AOP) for skin sensitization to risk assessment. ALTEX, 2013:30; 473–486.

72.

Asturiol

, Casati

, Worth

. Consensus of classification trees for skin sensitisation hazard prediction. Toxicol In Vitro, 2016:36; 197–209.

73.

Dimitrov

, Low

, Patlewicz

, et al. Skin sensitization: Modeling based on skin metabolism simulation and formation of protein conjugates. Int J Toxicol, 2005:24; 189–204.

74.

Dearden

, Hewitt

, Roberts

, et al. Mechanism-based QSAR modeling of skin sensitisation. Chem Res Toxicol, 2015:28; 1975–1986.

75.

Patlewicz

, Kuseva

, Kesova

, et al. Towards AOP application—Implementation of an integrated approach to testing and assessment (IATA) into a pipeline tool for skin sensitization. Regul Toxicol Pharmacol, 2014:69; 529–545.

76.

Leontaridou

, Urbisch

, Kolle

, et al. The borderline range of toxicological methods: Quantification and implications for evaluating precision. ALTEX Online first, https://doi.org/1010.14753/altex.1606271.

77.

Locke

, Westphal

, Tischler

, et al. Implementing toxicity testing in the 21st century: Challenges and opportunities. Int J Risk Assess Manag, 2017:20; 198–225.

78.

Urbisch

, Mehling

, Guth

. Regulatory use of non-animal test methods in chemical industry: the example of skin sensitization. In: Towards the Replacement of in vivo Repeated Dose Systematic Toxicity Testing. SEURAT-1 Initiative and the European Cosmetics Association (Cosmetics Europe) (eds); pp. 42–54. France: Imprimerie Mouzet; 2015.

79.

Rorije

, Aldenberg

, Buist

, et al. The OSIRIS weight of evidence approach: ITS for skin sensitisation. Regul Toxicol Pharmacol, 2013:67; 146–156.

80.

Gomes

, Noçairi

, Thomas

, et al. Stacking prediction for a binary outcome. COMPSTAT, 20th International Conference on Computational Statistics. Limassol 2012, pp. 271–282.

81.

Matheson

, Zang

, Strickland

, et al. ICCVAM integrated decision strategy for skin sensitization. The Toxicologist, 2015:144, 90.

82.

Strickland

, Zang

, Kleinstreuer

, et al. Integrated decision strategies for skin sensitization hazard. J Appl Toxicol, 2016:36:1150–1162.

83.

Tsujita-Inoue

, Atobe

, Hirota

. In silico risk assessment for skin sensitization using artificial neural network analysis. J Toxicol Sci, 2015:40; 193–209.

84.

OECD. Test No. 442C: In Chemico Skin Sensitisation. Direct Peptide Reactivity Assay (DPRA), OECD Guidelines for the Testing of Chemicals. Paris: OECD Publishing; 2015.

85.

Natsch

, Ryan

, Foertsch

, et al. A dataset on 145 chemicals tested in alternative assays for skin sensitization undergoing prevalidation. J Appl Toxicol, 2013:33; 1337–1352.

86.

Bergstrom

, Varian

. Intermediate Microeconomics: Instructor's Manual. W.W. Norton & Company. New York, 2003.

87.

Claxton

. Bayesian approaches to the value of information: Implications for the regulation of new pharmaceuticals. Health Econ, 1999:8; 269–274.

88.

Claxton

, Ginnelly

, Sculpher

, et al. A pilot study on the use of decision theory and value of information analysis as part of the NHS Health Technology Assessment programme. Health Technol Assess, 2004:8; 1–103.

89.

Cunningham

. An introduction to economic evaluation of health care. J Orthod, 2001:28; 246–250.

90.

Nordberg

, Rudén

, Hansson

. Towards more efficient testing strategies—Analyzing the efficiency of toxicity data requirements in relation to the criteria for classification and labelling. Regul Toxicol Pharmacol, 2008:50; 412–419.

91.

Casati

, Worth

, Amcoff

. EURL ECVAM Strategy for Replacement of Animal Testing for Skin Sensitisation Hazard Identification and Classification EUR—Scientific and Technical Research Series. Luxembourg: Publications Office of the European Union; 2013.

92.

Roberts

, Patlewicz

. Non‐animal assessment of skin sensitization hazard: Is an integrated testing strategy needed, and if so what should be integrated?. J Appl Toxicol, 2017. [Epub ahead of print]; DOI: 10.1002/jat.3479.