Abstract
Introduction:
Systematic reviews (SR) collect and integrate data corpuses into consistent, computable, and comparable datasets. The adverse outcome pathway (AOP) framework facilitates the linking of data describing molecular initiating events, through one or more key events (KEs), to adverse biological outcomes. To explore the potential application of data from SRs to the AOP framework, a case study was conducted to explore mapping SR to existing AOP KEs.
Methods:
SR data consisted of in vitro and in vivo androgen receptor (AR) toxicity information from nonmammalian vertebrate species collected as described by the authors, limiting data comparability. Data were standardized and mapped to terms for Level of Biological Organization, Object, Process, and Action using existing KEs in the AOP-Wiki as a source for endpoint terms.
Results:
In vitro SR data had 131 of 264 records that mapped to AR transactivation, while in vivo data had 226 of 1891 records directly mappable to 31 different KEs (e.g., increased vitellogenin messenger RNA). When no appropriate terms existed in the AOP-Wiki, standardized terms were proposed for future use. For unstructured data, mapping and standardization required additional interpretation.
Conclusions:
This study highlights the difficulties in aligning heterogeneously extracted SR data with a structured framework. This work highlights the need for language standardization and the adoption of clear data collection guidance prior to, and during, the SR to enhance data comparability and computability. The adoption of such efforts can advance the ability of resulting data to be reused and applied to frameworks such as AOPs. Lessons learned in this case study are applicable to similar efforts examining the use of automation in data extraction and evaluation.
Introduction
Regulatory toxicology involves collecting, processing, and evaluating epidemiological and experimental toxicology data to make toxicologically informed decisions for safeguarding human and environmental health against the harmful effects of chemical substances. Frequently, this information is obtained by performing activities such as systematic evidence mapping and systematic literature reviews, collectively referred to as systematic reviews (SR) in the subsequent article. SRs are labour-intensive and time consuming, and data extractions can generate heterogeneous data based on individual author-derived language, which may not be initially comparable across studies.1–3 For example, a recent SR of androgen receptor (AR) effects in nonmammalian species generated data for 42 species (data from which is used in subsequent work), with effects ranging from degeneration of testicular germ cells to a complete lack of female offspring, 4 and took 5 years from initial project initiation to dataset completion. Given the extensive resources required for large-scale SRs, there is interest in maximizing the use of SR data across existing databases and leveraging the data for other scientific efforts. However, to ensure maximum data reusability, the level-of-effort necessary for data translation must be understood, and processes for downstream comparison and analysis of SR data need to be established.
The adverse outcome pathway (AOP) framework, originally proposed by Ankley and Bennett, 5 was developed to provide a conceptual framework illustrating the connection between a direct molecular initiating event (MIE) and an adverse outcome (AO) through key events (KEs).5,6 Since it emerged, the use of the AOP framework has found broad applicability in both scientific research and decision making and has been used for applications including the development of Integrated Approaches to Testing and Assessment, 7 Quantified Structure–Activity Relationships,8,9 pharmaceutical safety evaluations, 10 and the prioritization of testing strategies and screening level hazard or risk assessments. 11 The AOP-Wiki tool 6 has since been developed to facilitate collaborative AOP development by collecting and linking expert-curated AOP information through a controlled vocabulary (Supplementary Table S1).4,12,13 Due to the widespread use of the AOP framework, mapping data generated from SR efforts to the AOP-Wiki has the potential to greatly increase the impact, scope, and utility of such data.
To ensure interoperability between SR-generated datasets and the AOP-Wiki, however, the amenability of SR data exports to the AOP framework should be investigated, with the processes necessary for such data sharing identified. While the development of AOPs has been expanded upon over the years to include more computable approaches for characterizing the biomechanistic details of a KE or KER such as biological Process, Object, and Action, 4 which facilitates comparison across diverse data collection methods, most SRs require the capture and organization of additional contextual details around an observation not included in the standard KE data fields, but are typically collected when assembling lines of evidence to support KERs. This is particularly true when the SR aims to compare data across species or uses diverse methodological data. The structure of the AOP framework lends itself to the use of both semantic and ontological tools that can help identify and harmonize important concepts resulting from SRs. For such tools to be effective, however, information must be represented in a consistent, comparable, and computable format. In the current study, we aimed to assess the feasibility of directly integrating data exported from a recently conducted SR to the AOP-Wiki framework, in addition to determining the level-of-effort needed to map the data and identifying any challenges in the process.
The US EPA developed an AR pathway model, which incorporates data from several high-throughput assays, interrogating multiple nodes of the AR signaling pathway. However, the AR high-throughput assays, which form the basis of the AR pathway model, only utilize mammalian receptors. In 2023, Vliet and Markey 14 published their study where they performed two SRs to collect existing in vitro and in vivo data on AR perturbation in nonmammalian species. The aim of Vliet and Markey 14 was to use SR to build lines of evidence toward AR-modulated pathway conservation across species and to provide a basis for extrapolating human AR-based data to nonhuman vertebrate species. This study used the data extracted from Vliet and Markey 14 to explore the potential application of SR data to the AOP framework and highlight the need for structured and systematic approaches to SRs.
Materials and Methods
This study used data extracted from Vliet and Markey. 14 Exact search strings, databases searched, and inclusion/exclusion criteria are documented in the Vliet and Markey 14 Supplementary Data. The SRs did not have date limits, and the in vitro search was initiated in December 2019, while the in vivo search was initiated in February 2020. No filters were applied to the searches. To explain the implementation of terms used in this article, including vocabulary, terminology, ontology, and taxonomy, a list of definitions is provided in Supplementary Table S1. Prior to the initiation of the SRs in Vliet and Markey, 14 a multidisciplinary team of toxicologists, SR experts, endocrine subject matter experts (SMEs), and computational scientists established guidelines for data extraction based on the species to be evaluated, exposure types to be collected (e.g., the exposure of interest was to a defined chemical or chemical mixture), and the general endpoints to be collected (e.g., reproductive behaviors, sexual differentiation, fertility). Screening forms were developed in DistillerSR®, with individual questions representing a mix of pre-populated answer options and free-text fields. During the SR, SMEs reviewed both abstracts and full-text articles, searching for data relevant to the project as outlined in the extraction guidelines. SMEs were directed to collect endpoints, chemicals, and contextual information as described by the authors and, when free-text fields were utilized, did not attempt to standardize them in the interest of minimizing reviewer workload. For the mapping efforts described subsequently, raw data were downloaded from DistillerSR software and converted to spreadsheet files for initial processing.
As the collection of data during the SR was not fully standardized against external models at the time of extraction, and data were collected across diverse data sources, the resulting raw dataset was complex and highly heterogenous. To identify the level-of-effort necessary to map SR data to the AOP-Wiki as-is (i.e., without any data cleanup or processing), all manipulation of the raw SR data was done by the authors, prior to AOP-Wiki mapping efforts. Examples of challenges include chemical names being represented in many ways, mis- or inconsistently spelled words, data describing the same endpoint reported in two different fields (i.e., results of the study were spread between the results and comments), and data that were reported in a compounded fashion with inconsistent or missing delimiters to allow for separation of the data. Due to the complexity and heterogeneity of the data, manual curation was determined to require less effort than automated text processing, as all variations needed to be mapped to a controlled vocabulary (Supplementary Table S1). An additional challenge was presented by compound entries (i.e., more than one data point in an individual field). Given the small size of the datasets, the effort required for complete human or machine readability was predicted to match or exceed that of manual curation. A workflow describing manual data processing is included in Figure 1.

Initial data processing workflow: Due to the complexity of raw SR data, manual processing was required. SR, systematic review.
Following initial processing, data were manually organized and binned to facilitate standardization prior to AOP-Wiki mapping (Fig. 2). Data were organized based on concepts developed by Ives and Campia, 4 who subdivided aspects of MIEs and KEs into event components (ECs). These ECs contain short words or phrases that were mapped to ontology terms. ECs consisted of Object, Process, and Action terms. Within this study and parallel efforts, we have introduced Phenotype as a fourth EC concept. For those ECs (Fig. 2A):

Data were organized into event components (ECs), occurring at the same level of biological organization, based on concepts developed by Ives and Campia.
4
Mapping groups common to both datasets
As a KE/KER occurs at a single level of biological organization, all ECs for a given KE/KER must occur at the same level of biological organization. For example, if the KE being described occurs at the population level, such as “skewed sex ratio,” the Object being described cannot be a molecular target such as the AR.
In addition to the data organized into ECs, additional contextual information was collected for each event. Some information was applicable to both the in vivo and in vitro datasets and was collected for all experiments (Fig. 2B). These categories include:
Further categories were needed to describe information unique to different study types. Bins specific to in vitro data (Fig. 2C) were:
While data bins specific to in vivo data (Fig. 2D) were:
The mapping sequence was as follows: (1) take each record in the SR and map it to relevant EC terms, as determined by SMEs, (2) determine whether the EC terms mapped to KEs in the AOP-Wiki, 6 (3) provide further SME review and discussion for EC terms which did not make to the Wiki to determine whether a different EC term might be more appropriate. In total, 264 rows of in vitro and 1891 rows of in vivo data were manually mapped to KEs using standard spreadsheet software. The finalized annotated texts were then used to compare between and across datasets. A summary of the mapping process is included in Figure 3.

A conceptual overview of the mapping process: The figure shows that endpoint data are first extracted and created, then the data are reviewed to identify text for mapping. The final step is mapping the observations into specific endpoints.
Endpoints that could not be mapped to an existing KE in the AOP-Wiki were excluded from this analysis but were also categorized separately and documented for future ontology. We first identified the endpoints that could not be mapped in the AOP-Wiki and set them aside while we worked on the endpoints that could be mapped. Then two experts developed internally consistent terms for the endpoints (which were not mappable to the AOP-Wiki). When there were disagreements, a third reviewer mediated the discussion to achieve a final designation. For fidelity/future use, we maintained all previous versions of the data, along with a column indicating which terms were mapped and which were not, so that we could go back and reevaluate the data when additional endpoints may have been added to the AOP-Wiki.
Results
To increase the reuse and applicability of resource-intensive SR data, this study explored the mapping of SR data exports directly from a completed SR into lines of evidence organized via the AOP framework. For the in vitro data, KEs for AR agonism and antagonism were already present; however, the AOP-Wiki contained multiple entries for AR transactivation that could be applicable including AR activation, altered transcription of genes by the AR, AR agonism, and AR antagonism (Supplementary Table S2). AR binding could be used to support all the AR transactivation KEs; however, since receptor binding does not always lead to transactivation, the argument can be made that binding should be presented as a MIE independent of transactivation. However, in the AOP-Wiki, AR binding data are used solely as supporting evidence for AR transactivation KEs and is not considered an independent KE. Thus, all 264 rows of the in vitro data were annotated accordingly. However, given that receptor binding does not necessarily lead directly to receptor transactivation, the authors recommended that the development of one or more new KEs for AR receptor binding be considered, such as “increased AR binding,” which reflects a change in binding, but does not necessarily imply a change in receptor activity.
Regarding in vivo data, some endpoints (e.g., increased plasma vitellogenin) mapped directly to existing KEs and were systematically categorized (Supplementary Table S3). However, some endpoints were not present in the AOP-Wiki, having not been described by any contributors to date (e.g., endpoints such as “undifferentiated gonads” or “sex-reversal”) (Supplementary Table S4). The endpoints that are presented in Supplementary Table S4 do not directly map to KEs in the AOP-Wiki. Some, such as “Cloacal gland movements—Decreased,” could, after discussion with SMEs potentially be mapped to a KE like #1390, “Sexual behavior, decreased.” Others, such as Spiggan production—decreased, could potentially be developed into KEs. Several “no effect” endpoints were also recorded in the SR data (Supplementary Table S5). Given that AOP-Wiki KEs specify the direction of effects, “no effect” measurements cannot be directly mapped. It is therefore recommended that null data be taken into consideration when describing the domain of applicability or discussing modulating factors for a given AOP.
Overall, in vivo data were complex, with only 226 of 1891 records directly mapping to existing KEs, due to either an incompatible data entry or simply because the AOP-Wiki is not yet comprehensive. For instance, “decreased serum 17β-estradiol” maps to event 219, “Reduction, Plasma 17beta-estradiol concentrations,” while “decreased vitellogenin in plasma” maps to event 221, “Reduction, Plasma vitellogenin concentrations.” It is worth noting that a potential benefit of the current work is the identification of KEs that are new or not currently represented in the AOP-Wiki. Benefits of these novel KEs include identifying KEs that can be considered for future AOPs with an AR activation MIE and serving to add evidence to existing AOPs with additional KEs. Some measurements, with further SME interpretation, may map to existing KEs or support KERs. Additionally, some extracted data mapped directly to one or more existing KEs or KERs. For example, “increased dorsal fin spots in females” could potentially map to event 674, “Reduced, Ability to attract spawning mates,” if properly interpreted by an SME. Other observations, such as “increased number of intersex individuals,” could potentially be developed into new KEs.
As mentioned previously, raw SR data were both complex and heterogeneous, posing challenges for automated text processing. We determined that computational methods would not be suitable for data cleaning or interpretation, as the number of steps needed for complete human or machine readability through computational curation was predicted to meet or exceed the efforts required for manual curation at this time. Computational methods struggled to distinguish when a data field should be divided, necessitating individual processing of each free-text column. Computational methods were tailored to the specific needs of the provided data using a customized KNIME data curation workflow. For columns with standardized information, such as sex, we were able to programmatically harmonize the data into a consistent format due to the limited number of possible variations. However, columns requiring more detailed explanations, such as results, lacked consistent formatting and language, making it challenging for computational methods to extract accurate interpretations. The inconsistent structure and absence of standardized formatting prevented the computer from accurately capturing the intended meaning. While humans can understand the context and placement of negation within a sentence, computational approaches rely on the exact position of specific terms, making accurate interpretation difficult without uniform data presentation. This was particularly true for results, observations, and comments, where formatting and text not easily comprehended by computers (due to arbitrary placement of negation, for example) were common (Fig. 4). These factors made computational efforts a case of diminishing returns. Situations like this, however, may present good opportunities for the use of large language models [i.e., artificial intelligence (AI) models that can understand, generate, and process human language] or other more flexible computational approaches. These advanced methods might be able to perform standardizations like the ones shown in Figure 4, with human review needed only for ensuring accuracy.
Multiple ways of stating “No Effect.”
Although numerous issues hindered the current mapping effort, the challenges encountered during the data mapping process will prove useful in informing future SR efforts such that data exports are more suitable for direct interoperability with the AOP-Wiki. Collectively, these data challenges underscore the importance of consistency in determining methods used, data endpoints collected, and how endpoints relate to each other across SMEs engaged in SR, as well as across SR efforts. Noted challenges include.
Discussion
When conducting SRs, it is essential to evaluate and synthesize consistent and computable data. This not only facilitates data comparisons within the defined research questions of the SR but also adheres to the FAIR data principles (Findable, Accessible, Interoperable, Reuseable), ultimately resulting in data that can connect to and be reused in broader frameworks (e.g., AOPs). Ideally, similar study types would result in comparable extracted data, enabling researchers to (1) easily understand the methods used, (2) clearly identify observed endpoints, (3) make effective comparisons with other research efforts, and (4) conduct further computational analysis and reuse data where appropriate. However, using current practices, substantial manual effort is required to extract and transform heterogeneous data from similarly designed studies to ensure effective, computable data synthesis. It can be expected that data extraction, transformation, and evaluation workflows will continue to move towards automation using AI models. 7 Therefore, to improve the process of annotating SR data for widespread computational analysis, future efforts should focus on designing workflows that enable extracted data to be more easily transformed and linked to broader frameworks. Ideally, such a workflow would include pre-mapped, expert-provided concepts (with limited use of unstructured text fields) and interoperability with existing ontologies and controlled vocabularies (Supplementary Table S1). This case study suggests a high return on investment in two key areas: (1) training SMEs and data extractors on the data characteristics needed for further computable evaluation and (2) designing review and data collection forms to encourage consistent, comparable, and computable results to be extracted. In addition to form structure, extraction instructions should emphasize reporting endpoints as unique entities, and effort should be made to implement consistent data reporting across document types, form structures, and individual extractions.
Depending on the type of data to be extracted, the degree of interpretation necessary by the extractor will vary. For example, well-defined data (e.g., data from standard receptor transactivation assays) are often consistent enough across studies to allow for the use of expert-provided concepts presented as answer pick lists. Further expansion of the AOP EC work from Ives and Campia, 4 with defined concepts and related ontologies has the potential to support list entries and add greater cross-effort interoperability, as highlighted in a number of recent publications.15–20 Heterogeneous data (e.g., from in vivo and observational studies), in contrast, often require interpretation to properly partition individual endpoints. For example, a data extractor would need sufficient expert knowledge to recognize that “significant increase in nuptial tubercle number” and “the presence of a dorsal fat pad” represent two separate endpoints. To ensure data reusability, extracted data should be consolidated to one field in which extractors use a unique, consistent, and computer-readable delimiter (e.g., line break or tab) to separate different endpoints. In addition, consistent interpretation of language is needed. For example, interpreting “directional change in response” as distinct from “change was statistically significant.”
Through the current mapping effort, challenges to the integration of data from SRs to the AOP framework were identified, and critical aspects of the SR process that could be improved to facilitate mapping of SR data to KEs in the AOP-Wiki were identified. Raw SR data required substantial initial processing prior to AOP data mapping. To avoid such pre-processing challenges and improve efficiency, plans for data cleaning and formatting should be conceptualized and, when possible, developed and implemented prior to initiation of the SR. When mapping to AOPs is among the initial goals of an SR, utilizing categories that will directly translate to the AOP framework (e.g., Level of Biological Organization, Object, Process, and Action) is recommended to format free-text data in a manner that will expedite mapping processes. Ongoing efforts to integrate structured methods documentation for AOPs (e.g., Methods2AOP Collaboration) may facilitate this recommendation by providing more structured documentation of method information. 21 Most importantly, taking steps to ensure the computability of extracted SR data such that it can connect to, and be reused in, broader frameworks will be essential as scientific efforts move toward increased data interoperability. Specific ways in which SRs can be improved upfront to maximize and expedite data reuse, including: (1) incorporating pre-mapped, expert-provided concepts and utilizing controlled vocabularies when available; (2) implementing review and extraction workflows that encourage reporting endpoints as unique entities; and (3) providing adequate training to SMEs and ensuring consistent understanding. A curation workflow, as described in Angrish and Burns, 22 which utilizes controlled vocabularies and implements rule-based matching, will allow SRs to be more amenable to AOP mapping. While computational text processing coupled with “human in the loop” manual reviews is currently necessary, establishing data standards for controlled vocabularies and ontologies through an iterative process with domain expert communities should support the ultimate goal of using generative AI for fully automated development of AOPs. Scientific workgroups (e.g., AI4AOPs, 23 Environmental Health Language Collaborative, 24 Evidence-based Toxicology Collaboration, 25 and the Monarch Initiative 26 ) are addressing SR data integration and workflows, terminology harmonization, use of AI and ontologies in AOPs, and SR workflows.
In summary, the reuse of SR data is essential to efficiently maximize the utilization of scientific resources. Therefore, understanding the level-of-effort, feasibility, and challenges associated with such data interoperability is of the utmost importance. The application of raw SR data exports to the AOP framework highlights the need for improved, consistent SR design that results in easily comparable and computable data. Future directions focused on the incorporation of computational data mapping strategies, supported through the leveraging of standardized vocabularies and ontologies, have the potential to greatly expedite the incorporation of data into different tools and platforms and increase knowledge interoperability as a whole.
Footnotes
Acknowledgments
The authors would like to acknowledge the efforts of the data extraction team, Neepa Choksi, Amber Daniel, Jon Hamm, AtLee Watson, and Andrew Ewans at Inotiv for extracting the datasets used in this effort. The authors would also like to acknowledge the efforts of Daniel Villeneuve [U.S. Environmental Protection Agency (U.S. EPA)] and Erin Yost (U.S. EPA) for their invaluable review of previous versions of the article. The authors would like to acknowledge George Woodall, Anand Mudambi, and Michelle Angrish at the U.S. EPA and Gail Hodge at the Alliant Alliance LLC for providing information for
from their unpublished work on environmental health language terminology definitions.
Disclaimer
Contractor’s roles did not include establishing Agency policy. All authors received their typical and usual salaries from their respective institutions for the development of the research and writing of the article. The views expressed in this article are those of the authors and do not necessarily reflect the views or policies of the U.S. EPA nor does the mention of trade names or commercial products indicate endorsement by the federal government. This article was not reviewed by and does not reflect the view of 3M or Underwriters Laboratories Research Institutes.
Authors’ Contributions
P.C.: Conceptualization, data curation (lead), methodology (lead), and writing—original draft (lead). J.A.: Data curation, writing—original draft, and writing—review and editing. S.B.: Project administration, writing—original draft, and writing—review and editing. B.C.: Data curation and writing—review and editing. S.E.: Supervision and methodology. V.H.: Writing—review and editing. S.G.L.: Supervision, funding acquisition, and writing—review and editing. K.M.: Conceptualization, methodology, funding acquisition, and writing—review and editing. S.M.F.V.: Conceptualization, data curation, methodology, and writing—review and editing.
Author Disclosure Statement
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding Information
The original systematic review for this project 14 was funded under contract number 68HE0H18D0009 to Battelle Memorial Institute from the Office of Chemical Safety and Pollution Prevention in the U.S. EPA, Washington, DC, USA. The work for this specific manuscript was funded under contract number 68HE0H18D0008 to RTI International from the Office of Chemical Safety and Pollution Prevention in the U.S. EPA, Washington, DC, USA.
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
