The Shannon entropy is a standard measure for the order state of symbol sequences, such as, for example, DNA sequences. In order to incorporate correlations between symbols, the entropy of n-mers (consecutive strands of n symbols) has to be determined. Here, an assay is presented to estimate such hi
High Statistics Block Entropy Measures of DNA Sequences
✍ Scribed by Pietro Liò; Antonio Politi; Marcello Buiatti; Stefano Ruffo
- Publisher
- Elsevier Science
- Year
- 1996
- Tongue
- English
- Weight
- 487 KB
- Volume
- 180
- Category
- Article
- ISSN
- 0022-5193
No coin nor oath required. For personal study only.
✦ Synopsis
We have used an improved block-entropy measure in order to gain some further insights into the short-range correlations present in whole chromosomes of S. cerevisiae, viruses and organelles and very large genomic regions of E. coli. Although DNA sequences are largely inhomogeneous and word frequencies are unevenly distributed, the comparison of entire chromosomes and large genomic regions show a "bulk" composition homogeneity. This property suggests that biases in selection, directional mutational pressure and recombination processes act in homogenizing the base composition of the DNA molecules within a genome but their mode of action, relative impact and direction may vary in different organisms. The most interesting results appear to be the differences between the SW (C,G/A,T) and RY (A,G/C,T) two-letter alphabet entropies. Deviations from randomness in E. coli and S. cerevisiae sequences particularly concern SW dinucleotide frequencies and RY tetranucleotide frequencies.
📜 SIMILAR VOLUMES
A DNA molecule can be viewed as a text written in four letters: A, T, G, and C. As we know, this text contains the genetic message of a living organism. The sequence of letters in the text cannot be very regular (otherwise it would carry very little information). To an observer who does not know the
A DNA primary sequence is a string consisting of letters on an alphabet X 5 {a, c, g, t}. Based on all of the 2-combinations of the set X, here the repetition is allowed, we transform a DNA primary sequence into a special sequence over a set with cardinality 10. With the 10-letter sequence, we assoc