BioHDF - Open Binary File Standards for Bioinformatics

Award Information
Agency: Department of Health and Human Services
Branch: N/A
Contract: 1R41HG003792-01
Agency Tracking Number: HG003792
Amount: $142,775.00
Phase: Phase I
Program: STTR
Awards Year: 2005
Solicitation Year: N/A
Solicitation Topic Code: N/A
Solicitation Number: N/A
Small Business Information
Geospiza, Inc., Box 344, 2442 Nw Market St, Seattle, WA, 98107
HUBZone Owned: N
Woman Owned: N
Socially and Economically Disadvantaged: N
Principal Investigator
 (206) 633-4403
Business Contact
Phone: (206) 633-4403
Research Institution
 Nonprofit college or university
DESCRIPTION (provided by applicant): Geospiza Inc. and the National Center for Supercomputing Applications (NCSA) are creating a standards based software framework around NCSA's Heirarchical Data Format (HDF5). The envisioned framework will integrate algorithms important in DNA and protein sequence analysis to create scalable high throughput software systems which will be accessed using new graphical user interfaces (GUIs) to provide researchers with new views of their data to finish sequencing projects in large-scale genome sequencing, microbial genome sequencing, viral epidemiology, polymorphism detection, phylogenetic analysis, multi-locus sequence typing, confirmatory sequencing, and EST analysis. In our vision, algorithms will be either integrated into the system to directly read and write from HDF5 project files, or they will communicate with project files via filter programs that produce standardized XML formatted data. Through this model, a scalable solution will support different applications of DNA sequencing, fulfilling the many needs and requirements expressed by the medical research community now and into the future. As the first step in this process we will, define requirements for editing and versioning data in DNA sequencing, research and propose data models for the computational phases of DNA sequencing and annotating DNA sequence data using existing standards, create a prototype application for DNA sequencing based SNP discovery, and engage the bioinformatics community for BioHDF adoption. In the past ten years the cost of sequencing DNA has dropped over 1000 fold and the amount of raw sequence data, entering our national repositories is doubling every 12 months. DNA sequencing is fundamental to biological research activities such as genomics, systems biology, and clinical medicine. Proposals are being sought to decrease sequencing costs by two orders of magnitude through technology refinements with an ultimate vision of developing technology to sequence human genome equivalents for $1000 each. The amount of data that will be produced through these endeavors is unimaginable. However, the $1,000 genome will not advance medical research unless we integrate all phases of the DNA sequencing process and treat the creation, management, finishing, analysis, and sharing of the data as common goals.

* Information listed above is at the time of submission. *

Agency Micro-sites

SBA logo
Department of Agriculture logo
Department of Commerce logo
Department of Defense logo
Department of Education logo
Department of Energy logo
Department of Health and Human Services logo
Department of Homeland Security logo
Department of Transportation logo
Environmental Protection Agency logo
National Aeronautics and Space Administration logo
National Science Foundation logo
US Flag An Official Website of the United States Government