𝔖 Bobbio Scriptorium
✦   LIBER   ✦

The exact distribution of the k-tuple statistic for sequence homology

✍ Scribed by W.Y.Wendy Lou


Publisher
Elsevier Science
Year
2003
Tongue
English
Weight
125 KB
Volume
61
Category
Article
ISSN
0167-7152

No coin nor oath required. For personal study only.

✦ Synopsis


The distribution theory of runs and patterns has become increasingly useful in the ΓΏeld of biological sequence homology. One important application in detecting tandem duplications among DNA sequence segments is the k-tuple statistic S n; k , the sum of matches in matching-runs of length k or longer in a sequence of n i.i.d. Bernoulli trials with success/matching probability p. Current approaches to this distribution problem are based on various approximations, due mainly to the numerical complexity of computing the exact distribution using a straightforward combinatorial approach. In this paper, we obtain a simple and e cient expression for the exact distribution of S n; k using the principle of ΓΏnite Markov chain imbedding. Our numerical results illustrate most importantly that for pattern lengths in the range n = 10 to 100, a range commonly used in detecting DNA tandem repeats, the distribution, in general, is highly skewed and far from normal.


πŸ“œ SIMILAR VOLUMES


The Exact Distribution of the Anderson-K
✍ H. KΓΌchenhoff; Dr. W. Lehmacher πŸ“‚ Article πŸ“… 1985 πŸ› John Wiley and Sons 🌐 English βš– 488 KB

The Anderaon-Kannema~ test is a rank test for treatment effects in a randomized block design with K treatments and N blocks. In this paper, an algorithm for computing the exact distribution of the Anderson-Kannemann test statistic under the null hypothesis ie deduced. Then, the exact distribution is

New approximations for the distribution
✍ Xiaoping Su; Sylvan Wallenstein πŸ“‚ Article πŸ“… 2000 πŸ› Elsevier Science 🌐 English βš– 91 KB

New approximations are given for the distribution of the length of the ith (iΒΏ1) smallest intervals containing r points, when N observations are randomly distributed on the unit interval, with primary emphasis on the case where N is a random variable.