𝔖 Bobbio Scriptorium
✦   LIBER   ✦

A Complementary Circular Code in the Protein Coding Genes

✍ Scribed by Didier G. Arquès; Christian J. Michel


Publisher
Elsevier Science
Year
1996
Tongue
English
Weight
256 KB
Volume
182
Category
Article
ISSN
0022-5193

No coin nor oath required. For personal study only.

✦ Synopsis


Recently, shifted periodicities 1 modulo 3 and 2 modulo 3 have been identified in protein (coding) genes of both prokaryotes and eukaryotes with autocorrelation functions analysing eight of 64 trinucleotides (Arquès et al., 1995). This observation suggests that the trinucleotides are associated with frames in protein genes. In order to verify this hypothesis, a distribution of the 64 trinucleotides AAA,..., TTT is studied in both gene populations by using a simple method based on the trinucleotide frequencies per frame. In protein genes, the trinucleotides can be read in three frames: the reading frame 0 established by the ATG start trinucleotide and frame 1 (resp. 2) which is the frame 0 shifted by 1 (resp. 2) nucleotide in the 5'-3' direction. Then, the occurrence frequencies of the 64 trinucleotides are computed in the three frames. By classifying each of the 64 trinucleotides in its preferential occurrence frame, i.e. the frame associated with its highest frequency, three subsets of trinucleotides can be identified in the three frames. This approach is applied in the two gene populations. Unexpectedly, the same three subsets of trinucleotides are identified in these two gene populations: Tzero = Xzero [symbol: see text] {AAA,TTT} with Xzero = {AAC,AAT,ACC,ATC,ATT, CAG,CTC,CTG,GAA,GAC,GAG, GAT,GCC,GGC,GGT,GTA,GTC,GTT,TAC,TTC} in frame 0, T1 = X1 [symbol: see text] {CCC} in frame 1 and T2 = X2 [symbol: see text] {GGG} in frame 2, each subset Xzero, X1 and X2 having 20 trinucleotides. Surprisingly, these three subsets have five important properties: (i) the property of maximal circular code for Xzero (resp. X1, X2) allowing the automatical retrieval of frame 0 (resp. 1, 2) in any region of a protein gene model (formed by a series of trinucleotides of Xzero) without using a start codon; (ii) the DNA complementarity property C (e.g. C(AAC) = GTT): C(T0) = T0, C(T1) = T2 and C(T2) = T1 allowing the two paired reading frames of a DNA double helix simultaneously to code for amino acids; (iii) the circular permutation property P (e.g. P(AAC) = ACA): P(Xzero) = X1 and P(X1) = X2 implying that the two subsets X1 and X2 can be deduced from Xzero; (iv) the rarity property with an occurrence probability of Xzero equal to 6 x 10(-8); and (v) the concatenation property with: a high frequency (27.5%) of misplaced trinucleotides in the shifted frames, a maximum (13 nucleotides) length of the minimal window to automatically retrieve the frame and an occurrence of the four types of nucleotides in the three trinucleotides sites, in favour of an evolutionary code. In the Discussion, the identified subsets Tzero, T1 and T2 replaced in the three two-letter genetic alphabets purine/pyrimidine, amino/ceto and strong/weak interaction, allow us to deduce that the RNY model (R = purine = A or G, Y = pyrimidine = C or T, N = R or Y) (Eigen & Schuster, 1978) is the closest two-letter codon model to the trinucleotides of Tzero. Then, these three subsets are related to the genetic code. The trinucleotides of Tzero code for 13 amino acids: Ala, Asn, Asp, Gln, Glu, Gly, Ile, Leu, Lys, Phe, Thr, Tyr and Val. Finally, a strong correlation between the usage of the trinucleotides of Tzero in protein genes and the amino acid frequencies in proteins is observed as six among seven amino acids not coded by Tzero, have as expected the lowest frequencies in proteins of both prokaryotes and eukaryotes.


📜 SIMILAR VOLUMES


A Circular Code in the Protein Coding Ge
✍ Didier G Arquès; Christian J Michel 📂 Article 📅 1997 🏛 Elsevier Science 🌐 English ⚖ 298 KB

A new maximal circular code X0(MIT) with two permutated maximal circular codes X1(MIT) and X2(MIT) is identified in the protein coding genes of mitochondria. The three subsets of 20 trinucleotides X0(MIT)={ACA, ACC, ATA, ATC, CTA, CTC, GAA, GAC, GAT, GCA, GCC, GCT, GGA, GGC, GGT, GTA, GTC, GTT, TTA,

An Evolutionary Analytical Model of a Co
✍ D.G. Arquès; J.-P. Fallot; C.J. Michel 📂 Article 📅 1998 🏛 Springer 🌐 English ⚖ 456 KB

Several frequency asymmetries unexpectedly observed (e.g. the frequency difference between T 1 and T 2 in the frame 0), are related to a new property of the subset T 0 involving substitutions. An evolutionary analytical model at three parameters ( p, q, t) based on an independent mixing of the 22 co

An Evolutionary Model of a Complementary
✍ Didier G. Arquès; Jean-Paul Fallot; Christian J. Michel 📂 Article 📅 1997 🏛 Elsevier Science 🌐 English ⚖ 444 KB

The subset X0 = [sequence: see text] of 20 trinucleotides has a preferential occurrence in frame 0 (a reading frame established by the ATG start trinucleotide) of protein (coding) genes of both prokaryotes and eukaryotes. This subset X0++ has the rarity property (6 x 10(-8)) to be a complementary ma

Pressures in archaeal protein coding gen
✍ Sujay Chattopadhyay; Satyabrata Sahoo; William A. Kanner; Jayprokas Chakrabarti 📂 Article 📅 2003 🏛 Hindawi Publishing Corporation 🌐 English ⚖ 145 KB

Our studies on the bases of codons from 11 completely sequenced archaeal genomes show that, as we move from GC-rich to AT-rich protein-coding gene-containing species, the differences between G and C and between A and T, the purine load (AG content), and also the overall persistence (i.e. the tendenc

Pathogenic mitochondrial DNA mutations i
✍ Lee-Jun C. Wong 📂 Article 📅 2007 🏛 John Wiley and Sons 🌐 English ⚖ 153 KB

## Abstract More than 200 disease‐related mitochondrial DNA (mtDNA) point mutations have been reported in the Mitomap (http://www.mitomap.org) database. These mutations can be divided into two groups: mutations affecting mitochondrial protein synthesis, including mutations in tRNA and rRNA genes; a

The Analysis of Protein Coding Genes Sug
✍ Fernando Alvarez; Maria Noel Cortinas; Hector Musto 📂 Article 📅 1996 🏛 Elsevier Science 🌐 English ⚖ 129 KB

exons in all nuclear protein coding genes (Donelson and We analyze evolutionary relationships among mem- Zeng, 1990). ## bers of the family Trypanosomatidae, with particular Two forms of host-protozoan relationships are emphasis on whether protein coding genes support known within the Trypanosoma