BioHDF - Open Binary File Standards for Bioinformatics

Award Information
Agency:
Department of Health and Human Services
Amount:
$142,775.00
Program:
STTR
Contract:
1R41HG003792-01
Solitcitation Year:
N/A
Solicitation Number:
N/A
Branch:
N/A
Award Year:
2005
Phase:
Phase I
Agency Tracking Number:
HG003792
Solicitation Topic Code:
N/A
Small Business Information
Geospiza, Inc.
Geospiza, Inc., Box 344, 2442 Nw Market St, Seattle, WA, 98107
Hubzone Owned:
N
Woman Owned:
N
Socially and Economically Disadvantaged:
N
Duns:
N/A
Principal Investigator
 TODD SMITH
 (206) 633-4403
 TODD@GEOSPIZA.COM
Business Contact
 TODD SMITH
Phone: (206) 633-4403
Email: TODD@GEOSPIZA.COM
Research Institution
 UNIVERSITY OF ILLINOIS
 UNIVERSITY OF ILLINOIS
OFFICE OF SPONSORED PROGRAMS & RESEARCH ADMIN
CHAMPAIGN, IL, 61820
 Nonprofit college or university
Abstract
DESCRIPTION (provided by applicant): Geospiza Inc. and the National Center for Supercomputing Applications (NCSA) are creating a standards based software framework around NCSA's Heirarchical Data Format (HDF5). The envisioned framework will integrate algorithms important in DNA and protein sequence analysis to create scalable high throughput software systems which will be accessed using new graphical user interfaces (GUIs) to provide researchers with new views of their data to finish sequencing projects in large-scale genome sequencing, microbial genome sequencing, viral epidemiology, polymorphism detection, phylogenetic analysis, multi-locus sequence typing, confirmatory sequencing, and EST analysis. In our vision, algorithms will be either integrated into the system to directly read and write from HDF5 project files, or they will communicate with project files via filter programs that produce standardized XML formatted data. Through this model, a scalable solution will support different applications of DNA sequencing, fulfilling the many needs and requirements expressed by the medical research community now and into the future. As the first step in this process we will, define requirements for editing and versioning data in DNA sequencing, research and propose data models for the computational phases of DNA sequencing and annotating DNA sequence data using existing standards, create a prototype application for DNA sequencing based SNP discovery, and engage the bioinformatics community for BioHDF adoption. In the past ten years the cost of sequencing DNA has dropped over 1000 fold and the amount of raw sequence data, entering our national repositories is doubling every 12 months. DNA sequencing is fundamental to biological research activities such as genomics, systems biology, and clinical medicine. Proposals are being sought to decrease sequencing costs by two orders of magnitude through technology refinements with an ultimate vision of developing technology to sequence human genome equivalents for $1000 each. The amount of data that will be produced through these endeavors is unimaginable. However, the $1,000 genome will not advance medical research unless we integrate all phases of the DNA sequencing process and treat the creation, management, finishing, analysis, and sharing of the data as common goals.

* information listed above is at the time of submission.

Agency Micro-sites

US Flag An Official Website of the United States Government