Comprehensive Sequence Assembly Algorithm
We will develop a general purpose sequence assembly algorithm and corresponding software to be used by large scale sequencing projects. We will incorporate many features of existing assembly programs into our software. Some of these features include sensitive overlap detection, robust detection of sequencing errors, and multiple sequence alignment. In addition, we plan to complete several enhancements. l) We will employ a conservative assembly protocol which makes errors only in extreme cases and with low probability. The protocol has been designed with the capability to handle complex repetitive DNA, sequencing errors and chimeric fragments. 2) We will allow the user to include further biological information to aid the assembly. This includes constraints that the two sequenced ends of a plasmid insert be separated by the length of the plasmid insert. 3) Although we anticipated that the discriminatory power of our algorithm will yield the unique, correct algorithm in most cases, we will enumer- ate all possibilities for the assembled sequence when faced with ambiguity caused by repeats. 4) The algorithm will design simple restriction digests where the correct assembly can be verified, or if there are multiple sequences, determine the correct one. This algorithm will be tested on the large scale sequencing projects at Collaborative Research as well as simulated data.
Small Business Information at Submission:
Principal Investigator:Ronald Lundstrom
Collaborative Research, Inc.
1365 Main Street Waltham, MA 02154
Number of Employees: