You are here

Development of DNA Sequence Data-Quality Metrics for Personal Genomics

Award Information
Agency: Department of Health and Human Services
Branch: National Institutes of Health
Contract: 1R43HG006976-01
Agency Tracking Number: R43HG006976
Amount: $197,398.00
Phase: Phase I
Program: SBIR
Solicitation Topic Code: NHGRI
Solicitation Number: PA11-096
Solicitation Year: 2012
Award Year: 2012
Award Start Date (Proposal Award Date): N/A
Award End Date (Contract End Date): N/A
Small Business Information
1390 Shorebird Way
United States
DUNS: 780119710
HUBZone Owned: No
Woman Owned: Yes
Socially and Economically Disadvantaged: No
Principal Investigator
 (650) 963-8927
Business Contact
Phone: (650) 963-8927
Research Institution

DESCRIPTION (provided by applicant): In June 2011, the FDA hosted a public meeting: Ultra High Throughput Sequencing for Clinical Diagnostic Applications - Approaches to Assess Analytical Validity (FDA Public Meeting, 2011). The background documentation for this meeting noted that In order to effectively utilize new sequencing technologies for clinical applications, appropriate evaluation tools (e.g., standards, well established criteria) are needed to determine the accuracy of the results. Achieving excellent data quality from next-generation sequencing technologies and understanding when the results may be in error is of clear importance, whether the results are being viewed by a clinician or a consumer. For this application, 23andMe will focus onthe analysis of the accuracy of next-generation sequencing technologies using approximately 150 exomes (including 50 new exomes sequenced for this project) and 100 whole genomes, specifically with reference to false positive and false negative rates for variants located in known disease genes. 23andMe has genotyped over 125,000 individuals and reported data back to them on hundreds of disease-associated variants. This experience has shown us that many important disease genes are difficult to assay with a genotyping chip, whether due to pseudogenes (e.g., GBA), paralogs (e.g., SMN1, CYP2D6) or for unknown reasons (e.g., APOE). We have also noted differences in genotyping accuracy between blood and saliva. For this reason, we expend significant resources validating the results of our genotyping chip using positive controls derived from the 23andMe customer database. The 50 exomes we will sequence for this project will be chosen to carry Sanger sequencing-validated disease-associated variants in the disease genes listed above. This project is a crucial first step in our goal of creating a pipeline for next-generation sequence annotation that combines (a) stringent QC based on genotyping array and Sanger sequencing data; (b) manually curated data from the humangenetics literature; and (c) computationally derived variant assessment for variants of unknown significance; to produce a report that will be returned to a consumer for a personalized health assessment. PUBLIC HEALTH RELEVANCE: Before we can achieve broad adoption of novel sequencing technologies in the clinic, we must understand when their results are accurate. This project will investigate error rates from next-generation sequencing technologies in clinically relevant disease genes. This will help us define data quality metrics and technical specifications for a sequencing-based Personal Genome Service(R).

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government