𝔖 Bobbio Scriptorium
✦   LIBER   ✦

An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences

✍ Scribed by R.F. Mott; T.B.L. Kirkwood; R.N. Curnow


Publisher
Springer
Year
1990
Tongue
English
Weight
514 KB
Volume
52
Category
Article
ISSN
1522-9602

No coin nor oath required. For personal study only.

✦ Synopsis


An accurate approximation is derived to the distribution of the length of the longest matching word present between two random DNA sequences of finite length, using only elementary probability arguments. The distribution is shown to be consistent with previous asymptotic results for the mean and variance of longest common words. The application of the distribution to assessing the statistical significance of sequence similarities is considered. It is shown how the distribution can be modified to take account of non-independence of neighbouring bases in real sequences.