You are here

Bioinformatics: Data Integration for Biomonitoring Applications



TECHNOLOGY AREA(S): Information Systems

OBJECTIVE: The present topic seeks computational approaches that “mine” publicly available microbiome data to identify changes in natural soil-borne communities which can be uniquely and predictably associated with environmental presence of ionizing radiation, radioisotopes including those in the actinide series, heavy metals, and/or process chemicals associated with nuclear activities.

DESCRIPTION: Unilateral monitoring for nuclear activities in the post-Cold War era demands new strategies in light of what represents an unconventional threat. Current technologies are ill-suited for non-permissive environments where long periods of observation are required and telling events may be ephemeral. Further, they are reliant upon key signatures which can be lost due to meteorological events and geochemical cycling. Of greater utility for the present purpose are monitoring approaches which provide near- to mid-field access to sites of potential interest and whose informational content is retained even when the original signatures are no longer present. Biological systems could fulfill such a role.

Biological systems are strongly reactive to the presence of pollutants in the environment and exhibit characteristic changes when exposed to specific classes. Biological sentinels thus are used routinely to track the health of “at risk” ecosystems, and associated information such as genomic and proteomic data are often archived in public repositories. Computational approaches can be used to analyze patterns (or lack thereof) in the wealth of available data in order to establish a valid starting point for evaluating impacts of contamination at a given site [1]. The increasingly more sophisticated algorithms developed to support bioinformatics can be used to interrogate indigenous organisms from sites of interest and determine whether there are distinctive changes which may be definitively and predictably linked to the presence of contamination whether or not it is still present [2].

Of particular utility are flora that inhabit routinely-sampled matrices. Soil, sediment, and water host a variety of microscopic life forms (“microbiomes”) for which genomic, biochemical, and trait-based data are already accessible [3]. Microbiomes are composed of thousands of microbial species intricately linked to the health and functioning of systems in which they reside, and community composition is a consequence of the dynamic interplay between the resident species and the local environment [4]. Environmental changes can induce selective pressures which result in notable shifts to species composition and density as well as expression of characteristic traits (e.g., particular protein isoforms) even where the specific taxa may vary from site-to-site [5]. End-state community structure can be somewhat predictable, given the nature of the exogenous stressor, as is demonstrated, e.g., by interrogation of uranium mine tailings [6], industrial areas [7], and other contaminated environmental matrices [8, 9,10]. Certain genera and, in some cases species, are characteristically present in predictable relative proportions or communities exhibit functional similarities. Further proof-of-concept is available in the biomedical realm, where health conditions such as liver disease are associated with the presence of particular gut microbiome constituencies [11].

The present topic seeks development of robust computational tools to explore the phylogenetic and functional characteristics of microbial communities in natural soils contaminated by ionizing radiation, radioisotopes including those in the actinide series, heavy metals, and/or process chemicals from nuclear activities. The overarching goal is to demonstrate that soil microbiomes tend to converge upon a particular community constituency and/or functional state given the chronic or episodic presence of contamination and that the state is predictable. Ideally, algorithms developed to address the need described herein would be applicable to the evaluation of other microbiomes to similarly elucidate predictability of resident communities given a certain condition and thus could be used in biomedical, forensic, and other applications. The research is intended to produce a coarse-grained analytical method that guides more refined site assessments.

PHASE I: Proposed efforts should be purely computational and should make use of existent datasets available in archives such as QIME, MG-RAST, NCBI, and EBI. Proof-of-concept will be provided by demonstrating that the bioinformatics approach(es) developed for the application described herein can be applied to a small, well-defined dataset where the environmental parameter space can be accurately circumscribed. To support proof-of-concept, use of “model system” contaminated sites (e.g., Chernobyl) is acceptable, although the Phase II end-state goal is to support analysis of soil microbiomes where exposures may be low level chronic or episodic in nature. Sources of variation, including those associated with environmental variability, sample collection and archival, technical protocols, and analytical methods, should be taken into account. Likewise, sample sizes and controls should be adequate and appropriate to support meaningful statistical analysis and lay the foundation for future efforts conducted in the same vein. Competitive proposals will include subject matter experts who fully understand the implications and limitations of including particular data in the model and will incorporate sensitivity analysis and risk mitigation plans. Proposals should explain methods that will be used or developed to quantify uncertainties. Applicants should delineate assumptions, including those associated with hypothetical cause-and-effect relationships between proposed community indicators (whether taxonomic or functional) and presence of soil-borne contamination. Likewise, ample rationale should be provided for selection of data types. Although deriving mechanistic understanding is not the intent of this topic, building predictive capacity will require reasonably educated conjecture regarding anticipated presence of particular taxa or functional groups. Phase I deliverables include (1) a final report and (2) the formatted dataset used to test developed algorithms. The report should supply the information requested above, describe model development including parameterization, and provide preliminary results on model fidelity The report should also include plans for development of a user interface which will address Phase II expectations. Operating system, software (where applicable), and data compatibility should be specifically addressed, as should proposed location of the interface.

PHASE II: Phase II efforts will focus on iterative improvement to the approach developed during Phase I. Efforts will be expanded to include additional datasets and to evaluate the predictive power of the model in terms of establishing that community constituency (whether taxonomic or functional) is commonly, and preferably, uniquely associated with presence of particular contaminants. Validation datasets will be included in order to assess model fidelity and performance in terms of retroactively identifying contaminated sites. Feasibility of extending the method to other microbiome types and stressors to support additional applications (e.g., biomedical applications) should be evaluated. The phase II deliverables are a report detailing (1) description of the approach, including optimization techniques and outcomes, (2) testing and validation data, (3) advantages and disadvantages/limitations of the method, and (4) potential for application to other problem sets; the source code; and a user interface and any associated executables.

PHASE III DUAL USE APPLICATIONS: Identify and exploit features that would be attractive for commercial or other private sector applications such as conducting “forensics” analysis to support development of diagnostics and therapeutics for illnesses whose interrelation with the human microbiome has been established. Examples include high-impact diseases such as cardiovascular disease, colorectal cancer, Alzheimer’s, ulcerative colitis, and periodontal disease.


  • Pylro VS et al. 2014. Brazilian microbiome project: revealing the unexplored microbial diversity—challenges and prospects. Microb Ecol 67:237-241.
  • Gilbert JA et al. 2010. Meeting report: the terabase metagenomics workshop and the vision of an Earth microbiome project. Standards in Genomic Sciences 3:243-248.
  • Xu Z et al. 2014. Bioinformatic approaches reveal metagenomics characterization of soil microbial community. PLOS ONE 9:1-11.
  • Goodrich et al. 2014. Conducting a microbiome study. Cell 158:250-262.
  • Martiny JBH et al. 2015. Microbiomes in light of traits: a phylogenetic perspective. Science 350:aa93231-aa93238.
  • Choudhary S, Pinaki S. 2010. Identification and characterization of uranium accumulation potential or a uranium mine isolated Pseudomonas strain. World J Microbiol Biotechnol 27:1795-1801.
  • Hookom M, Puchooa D. 2013. Isolation and identification of heavy metals tolerant bacteria from industrial and agricultural areas in Mauritius. Curr Res Microbiol Biotech 3:119-123.
  • Abulencia CB et al. 2006. Environmental whole-genome amplification to access microbial populations in contaminated sediments. Appl Environ Microbiol 72:3291-3301.
  • Sobolev D, Begonia MFT. 2008. Effects of heavy metal contamination upon soil microbes: lead-induced changes in general and denitrifying microbial communities as evidenced by molecular markers. Int J Environ Res Public Health 5:450-456.
  • Belozerkaya T et al. 2010. Characteristics of extremophylic fungi from Chernobyl Nuclear Power Plant. Nuclear Power Plant. Current Research, Technology and Education Topics in Applied Microbiology and Microbial Biotechnology, Mendez-Vilas A. (ed.), 88-94, Vol. 1, Formatex Research Center: Badajoz, Spain.
  • Kuczynski J et al. 2012. Experimental and analytical tools for studying the human microbiome. Nature Rev Genet 13:47-58.

KEYWORDS: Bioinformatics, biomonitoring, microorganisms, soil microbiome, biological sentinel


US Flag An Official Website of the United States Government