Statistical analysis of DNA sequences. II
β Scribed by Alexander Vilenkin; Lev Verkh
- Publisher
- Wiley (John Wiley & Sons)
- Year
- 1982
- Tongue
- English
- Weight
- 163 KB
- Volume
- 21
- Category
- Article
- ISSN
- 0006-3525
No coin nor oath required. For personal study only.
β¦ Synopsis
A DNA molecule can be viewed as a text written in four letters: A, T, G, and C. As we know, this text contains the genetic message of a living organism. The sequence of letters in the text cannot be very regular (otherwise it would carry very little information). To an observer who does not know the "language," an efficiently coded text appears to be an almost random sequence of letters. On the other hand, one can expect important differences between the texts and random sequences. In particular, DNA segments code for proteins and are "written" in three-letter "words" (codons)-their presence can result in some correlations in the sequence. In the present communication, we confirm this expectation by a quantitative analysis of some recently determined DNA sequences.1-3 It will be shown that statistical analysis can determine the presence and the length of the "words.
π SIMILAR VOLUMES
We have used an improved block-entropy measure in order to gain some further insights into the short-range correlations present in whole chromosomes of S. cerevisiae, viruses and organelles and very large genomic regions of E. coli. Although DNA sequences are largely inhomogeneous and word frequenci
The paper describes a package of APL-programs suited for the management and the analysis of DNA sequence data. Most of the application programs are related to experimental work in a DNA sequencing laboratory: Search for overlapping DNA fragments to construct complete DNA sequences; search for restri
After reviewing approaches to the nucleotide correlation of DNA sequences the preferential mode analysis method is emphasized and discussed in detail. The preferred modes and poor modes in coding regions, as well as in introns, 5'-caps and 3'-tails are found through the statistical analysis of seque