You are here

Tools for Combined Analysis of Optical Mapping and Sequencing Data

Description:

Fast-Track proposals will not be accepted.
Number of anticipated awards: 1
Budget (total costs):
Phase I: up to $150,000 for up to 12 months
PROPOSALS THAT EXCEED THE BUDGET OR PROJECT DURATION LISTED ABOVE MAY NOT BE FUNDED.
Background
Next generation sequencing (NGS) is used to determine the genetic sequence of pathogens. For public health laboratory surveillance activities, a high quality genome sequence is required to serve as a comparator or “reference sequence.” To generate the highest quality reference genome sequence requires the use of optical mapping (OM) to resolve sequence inversions and identify the ends of chromosomes. An optical map is like a restriction enzyme map for the entire genome. Currently, OM data and NGS data are assembled using separate software systems. However, no tool exists that can fully integrate all types of NGS data and OM data for graphical display. The few tools that do exist are limited in their functionality and visualization capabilities. This is especially problematic when working with large genomes with tens of thousands of data points that can take multiple days to analyze.
Project Goals
Although OM is currently used as a quality control tool for NGS assemblies, if an efficient tool were available to combine both datasets, optical mapping data could be used to accelerate or automate genome assemblies. Development of a tool would allow users to integrate optical mapping and sequencing data from any platform, thereby reduce investigation response time and increase sequence data quality.
Phase I Activities and Expected Deliverables

The project goal is to create a user-friendly graphical interface that can assemble, combine, and compare OM and NGS data generated from any platform. This tool will automatically scale optical maps based on NGS assemblies and should scale well with larger multi-chromosome genomes. Algorithms will be developed to match NGS assemblies to optical maps, scaffold sequencing reads using optical maps, and perform quality filtering for both sequencing reads and optical mapping reads. The tool will also have standard report generation and data export capabilities. All methods should be callable via a RESTful API. The tool will have access/group control, and users in the same group will be able to share data.
Month Deliverable 1 Import OM and NGS data from any platform 2.5 Develop algorithms to scaffold sequence data using optical mapping data 4 Develop algorithms to compare optical maps with NGS assemblies 6 Develop graphical interface and reporting
For Successful Phase I Awardees ONLY (Expected Phase II Deliverables)
Updates and added features to the tool and algorithms will be driven by advancements in OM and NGS technologies. Possible updates to the tool in Phase II include integration of long read NGS data to be used for scaffolding, automated misassembly prediction algorithms, collaboration capabilities, and improved graphics and usability.
Month Deliverable 9 Integrate long read NGS data for scaffolding 12 Develop misassembly prediction algorithms 15 Increase collaboration capabilities 18-24 Optimize commercialization potential
Impact
By developing a software tool that can visualize all types of optical mapping and NGS data, bioinformaticians can more effectively analyze sequencing data for various customers. Algorithms developed for this project could also be applied to future analytical tools. Further, this tool could be distributed to customer laboratories so that researchers can fully interrogate or reanalyze their own sequence assemblies, which is technically difficult at this time.
Commercialization Potential
The genome sequencing market is expected to grow to $20 billion by 2020. As this market grows and the complexity sequencing analysis increases, there will be broad demand for data analysis and visualization tools. The product market will only be as large as the overlap of both the sequencing and optical mapping markets (maximum $1B), but the technology developed for analyzing and visualizing sequencing data can be applied to new analytical tools for the larger market. In the future, we envision suites of tools for performing multivariable analysis of genomic sequencing data.

US Flag An Official Website of the United States Government