𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Codon usage trajectories and 7-cluster structure of 143 complete bacterial genomic sequences

✍ Scribed by Alexander Gorban; Tatyana Popova; Andrey Zinovyev


Book ID
103879926
Publisher
Elsevier Science
Year
2005
Tongue
English
Weight
665 KB
Volume
353
Category
Article
ISSN
0378-4371

No coin nor oath required. For personal study only.

✦ Synopsis


Three results are presented. First, we prove the existence of a universal 7-cluster structure in all 143 completely sequenced bacterial genomes available in Genbank in August 2004, and explained its properties. The 7-cluster structure is responsible for the main part of sequence heterogeneity in bacterial genomes. In this sense, our 7 clusters is the basic model of bacterial genome sequence. We demonstrated that there are four basic ''pure'' types of this model, observed in nature: ''parallel triangles'', ''perpendicular triangles'', degenerated case and the flower-like type.

Second, we answered the question: how big are the position-specific information and the contribution connected with correlations between nucleotide. The accuracy of the mean-field (context-free) approximation is estimated for bacterial genomes.

We show that codon usage of bacterial genomes is a multi-linear function of their genomic G+C-content with high accuracy (more precisely, by two similar functions, one for eubacterial genomes and the other one for archaea). Description of these two codon-usage trajectories is the third result.

All 143 cluster animated 3D-scatters are collected in a database and is made available on our web-site: