You are here
Scalable Framework for Integrating Multi-Omics Data for Biosystem Analysis
Phone: (503) 475-6660
Phone: (503) 260-0334
Type: Nonprofit College or University
Understanding the genomic basis of economically important plants for growth time, crop yield, responses to drought and disease resistance is of critical importance to sustaining and improving food supplies for humans and livestock, as well as insuring sufficient raw material availability for industries that depend on plant materials, such as biofuel manufacture. Current computational methods for analysis are suitable at the scale of individual experiments, but analysis of larger- scale experiments, or meta-analyses that examine data across many transcriptomic experiments, are difficult to perform with existing tools. The anticipated explosion in plant-based sequencing, driven by decreasing sequencing costs, and fewer barriers to plant experimentation, makes it urgent to build a general and efficient computational platform to hold and process large datasets. To address this challenge of large and heterogenous data, we will extend our distributed analytics platform for the aggregation, representation, and analysis of multiple modalities of plant and microbiome genomic data. Already proven for medical genomic data, our approach is expected to provide for this community both more efficient storage and faster processing across a range of cloud computing environments. For Phase 1, we will demonstrate a distributed framework for gene-level summarization of raw read counts, followed by exon-level summarization with feature overlaps for finer-grain differential analysis, and finally implementing read-level storage of transcriptomics data, permitting analyses such as de novo transcript assembly. In parallel, we will add interfaces, as a proof of concept, to a select group of publicly available annotation sources and tools. Commercial applications will center around a commercial-quality software framework for food and bioenergy companies investigating potentially novel genomic or splicing events. Such investigations would be able to analyze large numbers of samples due to the scalability of the storage and computation within the framework.
* Information listed above is at the time of submission. *