Elsevier

Computational Toxicology

Volume 25, February 2023, 100261
Computational Toxicology

Exploring genetic influences on adverse outcome pathways using heuristic simulation and graph data science

https://doi.org/10.1016/j.comtox.2023.100261Get rights and content

Highlights

  • AI methods can be leveraged to uncover patterns of genetic regulation in AOPs using observational data.

  • The AOP framework and AI methods can be implemented to gain novel insights into toxicity-mediated mechanisms and outcomes.

  • This is the first study into how GP can be used to understand the genetic mechanisms underlying toxicity-mediated disease.

  • The simultaneous association observed for variants in the AHR and ABCB11 genes indicates increased risk for liver cancer.

  • Our socioeconomic deprivation approach provides a tool for improving social justice in environmental health studies.

Abstract

Adverse outcome pathways provide a powerful tool for understanding the biological signaling cascades that lead to disease outcomes following toxicity. The framework outlines downstream responses known as key events, culminating in a clinically significant adverse outcome as a final result of the toxic exposure. Here we use the AOP framework combined with artificial intelligence methods to gain novel insights into genetic mechanisms that underlie toxicity-mediated adverse health outcomes. Specifically, we focus on liver cancer as a case study with diverse underlying mechanisms that are clinically significant. Our approach uses two complementary AI techniques: Generative modeling via automated machine learning and genetic algorithms, and graph machine learning. We used data from the US Environmental Protection Agency’s Adverse Outcome Pathway Database (AOP-DB; aopdb.epa.gov) and the UK Biobank’s genetic data repository. We use the AOP-DB to extract disease-specific AOPs and build graph neural networks used in our final analyses. We use the UK Biobank to retrieve real-world genotype and phenotype data, where genotypes are based on single nucleotide polymorphism data extracted from the AOP-DB, and phenotypes are case/control cohorts for the disease of interest (liver cancer) corresponding to those adverse outcome pathways. We also use propensity score matching to appropriately sample based on important covariates (demographics, comorbidities, and social deprivation indices) and to balance the case and control populations in our machine language training/testing datasets. Finally, we describe a novel putative risk factor for LC that depends on genetic variation in both the aryl-hydrocarbon receptor (AHR) and ATP binding cassette subfamily B member 11 (ABCB11) genes.

Introduction

Informatics and computational methods have revolutionized biomedical research and enabled scientists to explore questions that are either infeasible or impossible through traditional experimentation alone [32]. In environmental health and toxicology, common computational tasks include building and training models that predict various chemical properties, conducting statistical analysis of observational and epidemiological data to better understand exposure-related health outcomes, and performing network analyses to discover key processes in biochemical pathways, among others [17], [38]. Despite the successes made using these methods, some key deficiencies have become apparent in toxicological research, such as a lack of richly structured, multimodal biomedical data describing chemicals and the biological systems that respond to chemical exposure [31] and a paucity of novel methods for discovering new knowledge from these complex data resources [36]. In this paper, we employ both to gain new insights into a phenomenon of growing interest: the influence of genetics on susceptibility to an adverse outcome following specific chemical exposures.

Adverse Outcome Pathways (AOPs) are pathway-like descriptions that outline the mechanistic associations between molecular exposure events and higher-order clinical and population-level outcomes that may arise from the exposure [2], [26]. AOPs consist of molecular initiating events (MIEs), key events (KEs), and adverse outcomes (AOs). By definition, a KE is any internal step within an AOP at some level of biological organization, and an MIE is a particular kind of KE that both initiates an AOP and is comprised of a molecular interaction between a toxicant and a body component. AOPs are classified according to their respective health outcomes, and AOPs associated with similar outcomes often overlap to create an ‘AOP network.’ An AOP’s set of KEs can include genetic polymorphisms that are associated with higher risk to the adverse outcome. For example, colon cancer AOPs include 53 unique SNP associations originally derived from GWAS [37]. This study will attempt to look at the influence genetic phenomena have on susceptibility to adverse outcomes after specific chemical exposures using AOPs as a framework for reference.

Methodologically, one area in particular that has experienced rapid growth, and holds great promise in all areas of biomedicine, is artificial intelligence (AI). AI broadly aims to construct computational systems that make intelligent decisions based on available data, knowledge, and/or human input. The scope of what comprises AI is broad, and usually nebulously defined. In this paper, we explore two areas within AI: Evolutionary algorithms and graph data science. Evolutionary algorithms are a family of algorithms that imitate processes found in biological evolution to optimize a system (e.g., a predictive model, a symbolic mathematical equation, or even another algorithm). Unsurprisingly, evolutionary computation is often used in computational biology, for example, in the context of simulating natural systems or processes [10], [11], [22] and building machine learning classifiers that perform well on a specific task [16], [23], [28]. Graph data science refers to the quantitative analysis of graphs – sometimes known as networks (e.g., biological networks), and comprised of a set of nodes connected by a set of edges that define relationships between those nodes [7], [27]. Some tasks within graph data science involve community detection [9], identification of the shortest paths linking two nodes in a graph [12], determining ‘hub nodes’ that play critical roles in the global structure of a graph [7], [41], and using computational algorithms that yield quantitative understandings of the behavior and characteristics of a given graph [1], [15]. Since AOPs can be represented as graphs, graph data science provides a powerful set of tools for discovering properties of AOPs that are not obvious through manual inspection.

Here, we propose a novel approach to gain understanding of the mechanisms underlying genetic influences on toxic adverse outcomes, without the inclusion of associated case-control information, that leverages these two areas of AI, and subsequently evaluates the approach in the context of toxicity-mediated adverse outcome pathways involved in liver cancer (LC). Briefly, we train interpretable generative models to construct synthetic datasets resembling real-world LC AOP genotype data via the HIBACHI software, and introspect the best models produced by HIBACHI (Heuristic Identification of Biological Architectures for simulating Complex Hierarchical genetic Interactions) for the most prominent AOP SNPs that influence LC outcomes. HIBACHI is a command line utility based on genetic programming (GP) that generates (synthetic) datasets with interactions between input features [24], [25]. It uses the (μ + λ) evolutionary algorithm [6] to construct trees of primitive mathematical operations that can represent interactions between independent variables. For example, when applied to genetic data, these feature interactions may represent epistasis or mechanisms underlying polygenic traits. HIBACHI can take an existing dataset – referred to in the context of GP as a model – as input, which is then used to evaluate the fitness of candidate output datasets. Our hypothesis is that HIBACHI can create synthetic datasets of SNPs involved in AOPs that behave the same as real data for the same AOPs. This will allow us to explore the interpretable generative models used to create the synthetic data, which gives insights into interactions between specific features in the real data used to train HIBACHI. Conceptually, this process can be likened to a brute-force version of symbolic regression [18] that avoids pitfalls arising from statistical analyses on genetic data with complex interactions between features [40]. Importantly, this approach utilizes genomic and phenotypic data from real-world populations, combined with information and knowledge sourced from publicly available, open access databases describing mechanisms of toxicity. Our methods are generalizable to other diseases of interest and provide a new framework for toxicologists to explore genetic mechanisms that underlie toxic adverse outcome susceptibility.

Section snippets

Data sources

Our analysis uses data from the US Environmental Protection Agency’s Adverse Outcome Pathway Database (AOP-DB) and the UK Biobank (UKBB). The AOP-DB provides a formal structure for AOPs and their contained key events, as well as the relationships and associations between key events, genes (and their variants), metabolic pathways, diseases, and other relationships of toxicological interest. Data in the AOP-DB are aggregated from third-party public databases, including automated data pulls from

AOPs and SNPs associated with liver cancer

Our initial query for LC AOPs finds 16 liver related AOPs and 189 SNPs associated with these AOPs. AOPs 1, 37, 41, 46, 107, 108, and 117 are specific, describing a particular etiology of LC or hepatocellular carcinoma, while the other AOPs describe LC in a more general context. Interestingly, the AOPs describing liver fibrosis, hepatotoxicity, and liver injury contain no SNP associations, although a number of these AOPs are still under development. The AOPs that feature SNP associations often

Discussion

The 4 SNPs implicated by HIBACHI are members of 2 AOPs: Cholestatic Liver Injury induced by Inhibition of the Bile Salt Export Pump (ABCB11), and Sustained AhR Activation leading to Rodent Liver Tumors. Since these two AOPs directly implicate key roles played by the Abcb11 and Ahr genes, these can be thought of as the central mediators of genetic risk to toxicity-induced LC. However, although these genes may be the most important in terms of disease etiology, the HIBACHI-identified SNPs may

Conclusions

In this study, we show that genetic programming and graph data science can be leveraged to uncover patterns of genetic regulation in adverse outcome pathways using real-world observational data. Our approach provides one of the first concrete examples of using HIBACHI – an open-source software tool originally designed to create synthetic datasets with interactions between features – on a task that increases our understanding of biological phenomena. We describe a novel association between

CRediT authorship contribution statement

Joseph D. Romano: Visualization. Liang Mei: Visualization. Jonathan Senn: Visualization. Jason H. Moore: Software. Holly M. Mortensen: Conceptualized the study methods.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work is supported by the Environmental Protection Agency's National Research Program in Chemical Safety and Sustainability, Adverse Outcome Pathway Discovery and Development (FY22 CSS AOPDD 4.3.2.2). This research has been conducted using data from UK Biobank, a major biomedical database. The work was additionally funded using grant support from the US National Institutes of Health: K99-LM013646 (PI: Romano), R01-AG066833, R01-LM010098, R01-LM013463 (PI: Moore), and P30-ES013508 (PI:

EPA Disclaimer

This manuscript has been reviewed by the Center for Public Health and Environmental Assessment, United States Environmental Protection Agency and approved for publication. Approval does not signify that the contents necessarily reflect the views and policies of the Agency nor does mention of trade names or commercial products constitute endorsement or recommendation for use. The authors declare no conflict of interest.

References (41)

  • S. Fortunato

    Community detection in graphs

    Physics Reports

    (2010)
  • L.J. Palmer

    UK Biobank: Bank on it

    Lancet (London, England)

    (2007)
  • M.P. van den Heuvel et al.

    Network hubs in the human brain

    Trends in Cognitive Sciences

    (2013)
  • T. Aittokallio

    Graph-based methods for analysing networks in cell biology

    Briefings in Bioinformatics

    (2006)
  • G.T. Ankley et al.

    Adverse outcome pathways: A conceptual framework to support ecotoxicology research and risk assessment

    Environmental Toxicology and Chemistry

    (2010)
  • A.V. Anzalone et al.

    Search-and-replace genome editing without double-strand breaks or donor DNA

    Nature

    (2019)
  • A. Auton et al.

    A global reference for human genetic variation

    Nature

    (2015)
  • U. Benedetto et al.

    Statistical primer: Propensity score matching and its alternatives

    European Journal of Cardio-Thoracic Surgery: Official Journal of the European Association for Cardio-Thoracic Surgery

    (2018)
  • H.-G. Beyer et al.

    Evolution strategies—A comprehensive introduction

    Natural Computing

    (2002)
  • B. Bollobás

    Modern Graph Theory

    (1998)
  • D.G. Clayton et al.

    Population structure, differential bias and genomic control in a large-scale, case-control association study

    Nature Genetics

    (2005)
  • P. François et al.

    A case study of evolutionary computation of biochemical adaptation

    Physical Biology

    (2008)
  • A.S. Fraser

    Monte Carlo analyses of genetic models

    Nature

    (1958)
  • G. Gallo et al.

    Shortest path algorithms

    Annals of Operations Research

    (1988)
  • GTEx Consortium et al.

    Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation

    Nature Genetics

    (2018)
  • O. Hankinson

    The aryl hydrocarbon receptor complex

    Annual Review of Pharmacology and Toxicology

    (1995)
  • W. Huber et al.

    Graphs in molecular biology

    BMC Bioinformatics

    (2007)
  • A.G.J. MacFarlane et al.

    Tools for intelligent control: Fuzzy controllers, neural networks and genetic algorithms

    Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences

    (2003)
  • R.J. Kavlock et al.

    Computational Toxicology—A State of the Science Mini Review

    Toxicological Sciences

    (2008)
  • La Cava, William, Orzechowski, Patryk, Burlacu, Bogdan, de Franca, Fabricio Olivetti, Virgolin, Marco, Jin, Ying,...
  • Cited by (0)

    View full text