Commercial Software Using High throughput Computational Techniques to Improve Genome Analysis

Award Information
Agency: Department of Health and Human Services
Branch: National Institutes of Health
Contract: 4R44HG009474-02
Agency Tracking Number: R44HG009474
Amount: $972,083.00
Phase: Phase I
Program: SBIR
Awards Year: 2018
Solicitation Year: 2015
Solicitation Topic Code: 172
Solicitation Number: PA15-269
Small Business Information
3151 VILLAGE CIR S, Ann Arbor, MI, 48108-2243
DUNS: 080055927
HUBZone Owned: N
Woman Owned: N
Socially and Economically Disadvantaged: N
Principal Investigator
 MARK KIEL
 (734) 223-2519
 kiel@genomenon.com
Business Contact
 MARK KIEL
Phone: (734) 223-2519
Email: kiel@genomenon.com
Research Institution
N/A
Abstract
Recent advances in DNA sequencing technology have not been matched by improved analytic techniques to quickly and accurately interpret patient genome data to inform diagnosis prognosis and therapy making decisions in the clinic and to identify candidate biomarkers of disease in research laboratories Development of automated techniques to facilitate interpretation of this data will benefit patient care and improve public health by promoting widespread use of cost efficient sequencing clinically and by making it feasible to sequence a broader range of patients including those with complex disease or to identify patients who have an elevated risk of developing future disease Our long term goal is to commoditize sequence interpretation using high throughput computational techniques in the same way that next generation DNA sequencing technology has commoditized genome data production The present project will result in commercial software that automates genome sequence interpretation Specifically we will develop software that automatically collects and organizes a comprehensive set of genetic information by systematically reading millions of scientific articles and scanning dozens of genetic variant databases software that uses this information to prioritize patient data into clinical categories based on the likelihood of disease and software that automatically identifies candidate biomarkers of disease from multi sample cohort data To do this we will use a variety of innovative data processing techniques First we will systematically mutate the reference genome in silico to produce a comprehensive database of every possible mutation at every position of every gene and use this data to query every word of every article ever published or any publicly available database to identify disease gene variant associations We will compare the results from this automated process to results obtained using more expensive and time consuming manual methods and hypothesize that we can achieve concordance and identify more variants and fold more references for each These results will be organized into clinically meaningful categories and presented in an interactive graphical interface that displays the evidence for each of these associations We will then use this information to drive prioritization of patient data based on similarities to known disease causing variants and the strength of evidence for their pathogenicity in order to increase analytic sensitivity and specificity thereby improving speed and reliability of sequencing in the clinic Our automated results will then be compared to conventional methods of data annotation and filtration for andgt patient samples from diseases Finally we will use the same prioritization strategy to comprehensively compare variant data between all patients within a disease cohort to automatically identify the variants most likely to lead to disease and compare our automated results to conventional methods for andgt samples from diseases The growth in the $ B genome sequencing market is driven by improvements in informatics techniques and automated solutions such as proposed here have significant commercial potential The successful completion of the proposed project will contribute to the public health mission of the NIH by promoting more widespread adoption of genome sequencing by making the interpretation of this data more accurate and cost effective in clinical and research laboratories The community of users that can benefit from this research include geneticists oncologists pathologists researchers and patients

* Information listed above is at the time of submission. *

Agency Micro-sites

SBA logo
Department of Agriculture logo
Department of Commerce logo
Department of Defense logo
Department of Education logo
Department of Energy logo
Department of Health and Human Services logo
Department of Homeland Security logo
Department of Transportation logo
Environmental Protection Agency logo
National Aeronautics and Space Administration logo
National Science Foundation logo
US Flag An Official Website of the United States Government