Automatic Discovery of Sub-molecular Sequence Domains in Multi-aligned Sequences: A Dynamic Programming Algorithm for Multiple Alignment Segmentation
✍ Scribed by ERIC POE XING; DENISE M. WOLF; INNA DUBCHAK; SYLVIA SPENGLER; MANFRED ZORN; ILYA MUCHNIK; CASIMIR KULIKOWSKI
- Publisher
- Elsevier Science
- Year
- 2001
- Tongue
- English
- Weight
- 446 KB
- Volume
- 212
- Category
- Article
- ISSN
- 0022-5193
No coin nor oath required. For personal study only.
✦ Synopsis
Automatic identi"cation of sub-structures in multi-aligned sequences is of great importance for e!ective and objective structural/functional domain annotation, phylogenetic treeing and other molecular analyses. We present a segmentation algorithm that optimally partitions a given multi-alignment into a set of potentially biologically signi"cant blocks, or segments. This algorithm applies dynamic programming and progressive optimization to the statistical pro"le of a multi-alignment in order to optimally demarcate relatively homogenous subregions. Using this algorithm, a large multi-alignment of eukaryotic 16S rRNA was analyzed. Three types of sequence patterns were identi"ed automatically and e$ciently: shared conserved domain; shared variable motif; and rare signature sequence. Results were consistent with the patterns identi"ed through independent phylogenetic and structural approaches. This algorithm facilitates the automation of sequence-based molecular structural and evolutionary analyses through statistical modeling and high performance computation.