Scalable Framework for Integrating Multi-Omics Data for Biosystem Analysis

Award Information
Agency: Department of Energy
Branch: N/A
Contract: DE-SC0019686
Agency Tracking Number: 242504
Amount: $224,920.00
Phase: Phase I
Program: STTR
Solicitation Topic Code: 01a
Solicitation Number: DE-FOA-0001940
Timeline
Solicitation Year: 2019
Award Year: 2019
Award Start Date (Proposal Award Date): 2019-02-19
Award End Date (Contract End Date): 2020-02-18
Small Business Information
12655 Beaverdam Road, Beaverton, OR, 97005-2129
DUNS: 080456284
HUBZone Owned: N
Woman Owned: N
Socially and Economically Disadvantaged: N
Principal Investigator
 Michael Wrinn
 (503) 475-6660
 mike@omicsautomation.com
Business Contact
 Kim Basney
Phone: (503) 260-0334
Email: basneyk@omicsautomation.com
Research Institution
 Oregon State University
 Oregon State University
3082-Cordley Hall
Corvallis, OR, 97331-8655
 Nonprofit college or university
Abstract
Understanding the genomic basis of economically important plants for growth time, crop yield, responses to drought and disease resistance is of critical importance to sustaining and improving food supplies for humans and livestock, as well as insuring sufficient raw material availability for industries that depend on plant materials, such as biofuel manufacture. Current computational methods for analysis are suitable at the scale of individual experiments, but analysis of larger- scale experiments, or meta-analyses that examine data across many transcriptomic experiments, are difficult to perform with existing tools. The anticipated explosion in plant-based sequencing, driven by decreasing sequencing costs, and fewer barriers to plant experimentation, makes it urgent to build a general and efficient computational platform to hold and process large datasets. To address this challenge of large and heterogenous data, we will extend our distributed analytics platform for the aggregation, representation, and analysis of multiple modalities of plant and microbiome genomic data. Already proven for medical genomic data, our approach is expected to provide for this community both more efficient storage and faster processing across a range of cloud computing environments. For Phase 1, we will demonstrate a distributed framework for gene-level summarization of raw read counts, followed by exon-level summarization with feature overlaps for finer-grain differential analysis, and finally implementing read-level storage of transcriptomics data, permitting analyses such as de novo transcript assembly. In parallel, we will add interfaces, as a proof of concept, to a select group of publicly available annotation sources and tools. Commercial applications will center around a commercial-quality software framework for food and bioenergy companies investigating potentially novel genomic or splicing events. Such investigations would be able to analyze large numbers of samples due to the scalability of the storage and computation within the framework.

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government