✦ LIBER ✦

Estimating the Entropy of DNA Sequences

✍ Scribed by Armin O. Schmitt; Hanspeter Herzel

Publisher: Elsevier Science
Year: 1997
Tongue: English
Weight: 231 KB
Volume: 188
Category: Article
ISSN: 0022-5193
DOI: 10.1006/jtbi.1997.0493

No coin nor oath required. For personal study only.

✦ Synopsis

The Shannon entropy is a standard measure for the order state of symbol sequences, such as, for example, DNA sequences. In order to incorporate correlations between symbols, the entropy of n-mers (consecutive strands of n symbols) has to be determined. Here, an assay is presented to estimate such higher order entropies (block entropies) for DNA sequences when the actual number of observations is small compared with the number of possible outcomes. The n-mer probability distribution underlying the dynamical process is reconstructed using elementary statistical principles: The theorem of asymptotic equi-distribution and the Maximum Entropy Principle. Constraints are set to force the constructed distributions to adopt features which are characteristic for the real probability distribution. From the many solutions compatible with these constraints the one with the highest entropy is the most likely one according to the Maximum Entropy Principle. An algorithm performing this procedure is expounded. It is tested by applying it to various DNA model sequences whose exact entropies are known. Finally, results for a real DNA sequence, the complete genome of the Epstein Barr virus, are presented and compared with those of other information carriers (texts, computer source code, music). It seems as if DNA sequences possess much more freedom in the combination of the symbols of their alphabet than written language or computer source codes.

📜 SIMILAR VOLUMES

High Statistics Block Entropy Measures o

High Statistics Block Entropy Measures of DNA Sequences

✍ Pietro Liò; Antonio Politi; Marcello Buiatti; Stefano Ruffo 📂 Article 📅 1996 🏛 Elsevier Science 🌐 English ⚖ 487 KB

We have used an improved block-entropy measure in order to gain some further insights into the short-range correlations present in whole chromosomes of S. cerevisiae, viruses and organelles and very large genomic regions of E. coli. Although DNA sequences are largely inhomogeneous and word frequenci

Similarity analysis of DNA sequences bas

✍ Chun Li; Hong Ma; Yang Zhou; Xiaolei Wang; Xiaoqi Zheng 📂 Article 📅 2010 🏛 John Wiley and Sons 🌐 English ⚖ 91 KB 👁 2 views

A DNA primary sequence is a string consisting of letters on an alphabet X 5 {a, c, g, t}. Based on all of the 2-combinations of the set X, here the repetition is allowed, we transform a DNA primary sequence into a special sequence over a set with cardinality 10. With the 10-letter sequence, we assoc

Maximum entropy image reconstruction of

Maximum entropy image reconstruction of DNA sequencing gel autoradiographs

✍ Dr. John K. Elder 📂 Article 📅 1990 🏛 John Wiley and Sons 🌐 English ⚖ 593 KB

DNA sequencing and helix–coil transition

DNA sequencing and helix–coil transition. II. Loop entropy and DNA melting

✍ M. Ya. Azbel 📂 Article 📅 1980 🏛 Wiley (John Wiley & Sons) 🌐 English ⚖ 664 KB

## Abstract An explicit analytical theory of DNA melting is constructed. It accounts for the loop entropy and the elasticity of DNA strands. Explicit analytical formulas are presented for the melting curves of natural DNA and periodic polymers. The nature of the DNA helix–coil transition is investi

Robustness of the Estimator of the Index

Robustness of the Estimator of the Index of Dispersion for DNA Sequences

✍ Rasmus Nielsen 📂 Article 📅 1997 🏛 Elsevier Science 🌐 English ⚖ 118 KB

If substitutions in DNA sequences follow a Poisson process, the ratio of the variance in the number of substitutions to the mean number of substitutions (the index of dispersion) should equal 1. In this paper, the robustness of the commonly applied estimator of the index of dispersion in replacement

An estimate for the replacement entropy

An estimate for the replacement entropy of microcrystallites

✍ Farid F. Abraham 📂 Article 📅 1972 🏛 Elsevier Science 🌐 English ⚖ 212 KB