๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

Fast, Optimal Alignment of Three Sequences Using Linear Gap Costs

โœ Scribed by DAVID R. POWELL; LLOYD ALLISON; TREVOR I. DIX


Book ID
102612693
Publisher
Elsevier Science
Year
2000
Tongue
English
Weight
160 KB
Volume
207
Category
Article
ISSN
0022-5193

No coin nor oath required. For personal study only.

โœฆ Synopsis


Alignment algorithms can be used to infer a relationship between sequences when the true relationship is unknown. Simple alignment algorithms use a cost function that gives a "xed cost to each possible point mutation*mismatch, deletion, insertion. These algorithms tend to "nd optimal alignments that have many small gaps. It is more biologically plausible to have fewer longer gaps rather than many small gaps in an alignment. To address this issue, linear gap cost algorithms are in common use for aligning biological sequence data. More reliable inferences are obtained by aligning more than two sequences at a time. The obvious dynamic programming algorithm for optimally aligning k sequences of length n runs in O(nI) time. This is impractical if k*3 and n is of any reasonable length. Thus, for this problem there are many heuristics for aligning k sequences, however, they are not guaranteed to "nd an optimal alignment. In this paper, we present a new algorithm guaranteed to "nd the optimal alignment for three sequences using linear gap costs. This gives the same results as the dynamic programming algorithm for three sequences, but typically does so much more quickly. It is particularly fast when the (three-way) edit distance is small. Our algorithm uses a speed-up technique based on Ukkonen's greedy algorithm (Ukkonen, 1983) which he presented for two sequences and simple costs.


๐Ÿ“œ SIMILAR VOLUMES


Normalization of Affine Gap Costs Used i
โœ Lloyd Allison ๐Ÿ“‚ Article ๐Ÿ“… 1993 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 226 KB

It is shown how to normalize the costs of an alignment algorithm that employs affine or linear gap costs. The normalized costs are interpreted as the -log probabilities of the instructions of a finite-state edit-machine. This gives an explicit model relating sequences that can be linked to processes