USA flag logo/image

An Official Website of the United States Government

BioHDF - Open Binary File Standards for Bioinformatics

Award Information

Agency:
Department of Health and Human Services
Branch:
N/A
Award ID:
75670
Program Year/Program:
2009 / STTR
Agency Tracking Number:
HG003792
Solicitation Year:
N/A
Solicitation Topic Code:
N/A
Solicitation Number:
N/A
Small Business Information
GEOSPIZA, INC.
BOX 344, 2442 NW MARKET ST SEATTLE, WA 98107-
View profile »
Woman-Owned: No
Minority-Owned: No
HUBZone-Owned: No
 
Phase 2
Fiscal Year: 2009
Title: BioHDF - Open Binary File Standards for Bioinformatics
Agency: HHS
Contract: 2R42HG003792-02A1
Award Amount: $1,165,367.00
 

Abstract:

DESCRIPTION (provided by applicant): The first wave of Next Generation ( Next Gen ) sequencing technologies combines molecular resolution with extremely high throughput to dramatically reduce sequencing costs and increase assay sensitivity and specificity. These technologies will provide large numbers of laboratories with Genome Center levels of throughput to make discoveries and develop new assays never before imagined. However, widespread adoption of Next Gen will be hindered because current bioinformat ics programs do not scale; they are inefficient in data storage, processing, and memory utilization. The most popular programs typically copy and recopy data to new files many times during processing, require that all data be maintained in random access me mory (RAM) when running, and cannot incrementally process data. To overcome these issues, fundamental changes in data management and processing are needed. Geospiza and The HDF Group are collaborating to develop portable, scalable, bioinformatics technolog ies based on HDF5 (Hierarchical Data Format http://www.hdfgroup.org ). We call these extensible domain-specific data technologies BioHDF. BioHDF will implement a data model that supports primary DNA sequence information (reads, quality values, and meta data) and results from sequence assembly and variation detection algorithms. BioHDF will extend HDF5 data structures and library routines with new features (indexes, additional compression, and graph layouts) to support the high performance data storage a nd computation requirements of Next Gen Sequencing. BioHDF will include APIs, software tools, and a viewer based on HDFView to enable its use in the bioinformatics and research communities. Using BioHDF, researchers will be able perform whole genome shotgu n sequencing (WGS), tag and count experiments (EST analysis, promoter mapping, DNA methylation, functional mapping), and variation analysis; they will also be able to export datasets in formats accepted by the key databases to publish their work. As a pr ogramming environment, BioHDF can be easily extended to accept data from new data collection platforms, and format data for interchange with many databases. Core BioHDF tools will be delivered to the research community as an open source technology. Geospiz a will use BioHDF in its Finch. line of products to deliver software systems and applications to support clinical research, diagnostics, and other relevant activities that rely on genetic data. PUBLIC HEALTH RELEVANCE: The overall goal of the BioHDF Phase II project is to make it possible for medical research and clinical communities to take full advantage of the latest DNA sequencing platforms in their efforts to improve public health. Geospiza and The HDF Group will build on their expertise in Laboratory Information Management Systems and high- volume, high-complexity scientific data management systems to create and deliver bioinformatics software systems that can handle the massive amounts of data produced by the latest sequencing instruments. The integra ted systems will keep track of collected samples, sequence data, DNA tests, and other laboratory records and biological data associated with the entire sequencing and analysis process, and make it easy for clinicians to use the technology to do their work.

Principal Investigator:

Todd M. Smith
2066334403
TODD@GEOSPIZA.COM

Business Contact:

Todd M. Smith
sandy@geospiza.com
Small Business Information at Submission:

GEOSPIZA, INC.
GEOSPIZA, INC. BOX 344, 2442 NW MARKET ST SEATTLE, WA 98107

EIN/Tax ID: 911894564
DUNS: N/A
Number of Employees: N/A
Woman-Owned: No
Minority-Owned: No
HUBZone-Owned: No
Research Institution Information:
HDF GROUP
HDF GROUP
1901 S 1ST ST, STE C-2
CHAMPAIGN, IL 61820 7406
RI Type: Domestic nonprofit research organization