๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

Finding Similar Regions in Many Sequences

โœ Scribed by Ming Li; Bin Ma; Lusheng Wang


Publisher
Elsevier Science
Year
2002
Tongue
English
Weight
234 KB
Volume
65
Category
Article
ISSN
0022-0000

No coin nor oath required. For personal study only.

โœฆ Synopsis


Algorithms for finding similar, or highly conserved, regions in a group of sequences are at the core of many molecular biology problems. Assume that we are given n DNA sequences s 1 , ..., s n . The Consensus Patterns problem, which has been widely studied in bioinformatics research, in its simplest form, asks for a region of length L in each s i , and a median string s of length L so that the total Hamming distance from s to these regions is minimized. We show that the problem is NP-hard and give a polynomial time approximation scheme (PTAS) for it. We then present an efficient approximation algorithm for the consensus pattern problem under the original relative entropy measure. As an interesting application of our analysis, we further obtain a PTAS for a restricted (but still NP-hard) version of the important consensus alignment problem allowing at most constant number of gaps, each of arbitrary length, in each sequence.


๐Ÿ“œ SIMILAR VOLUMES