Codon usage trajectories and 7-cluster structure of 143 complete bacterial genomic sequences
✍ Scribed by Alexander Gorban; Tatyana Popova; Andrey Zinovyev
- Book ID
- 103879926
- Publisher
- Elsevier Science
- Year
- 2005
- Tongue
- English
- Weight
- 665 KB
- Volume
- 353
- Category
- Article
- ISSN
- 0378-4371
No coin nor oath required. For personal study only.
✦ Synopsis
Three results are presented. First, we prove the existence of a universal 7-cluster structure in all 143 completely sequenced bacterial genomes available in Genbank in August 2004, and explained its properties. The 7-cluster structure is responsible for the main part of sequence heterogeneity in bacterial genomes. In this sense, our 7 clusters is the basic model of bacterial genome sequence. We demonstrated that there are four basic ''pure'' types of this model, observed in nature: ''parallel triangles'', ''perpendicular triangles'', degenerated case and the flower-like type.
Second, we answered the question: how big are the position-specific information and the contribution connected with correlations between nucleotide. The accuracy of the mean-field (context-free) approximation is estimated for bacterial genomes.
We show that codon usage of bacterial genomes is a multi-linear function of their genomic G+C-content with high accuracy (more precisely, by two similar functions, one for eubacterial genomes and the other one for archaea). Description of these two codon-usage trajectories is the third result.
All 143 cluster animated 3D-scatters are collected in a database and is made available on our web-site: