𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Information Content of Protein Sequences

✍ Scribed by OLAF WEISS; MIGUEL A JIMÉNEZ-MONTAÑO; HANSPETER HERZEL


Publisher
Elsevier Science
Year
2000
Tongue
English
Weight
153 KB
Volume
206
Category
Article
ISSN
0022-5193

No coin nor oath required. For personal study only.

✦ Synopsis


The complexity of large sets of non-redundant protein sequences is measured. This is done by estimating the Shannon entropy as well as applying compression algorithms to estimate the algorithmic complexity. The estimators are also applied to randomly generated surrogates of the protein data. Our results show that proteins are fairly close to random sequences. The entropy reduction due to correlations is only about 1%. However, precise estimations of the entropy of the source are not possible due to "nite sample e!ects. Compression algorithms also indicate that the redundancy is in the order of 1%. These results con"rm the idea that protein sequences can be regarded as slightly edited random strings. We discuss secondary structure and low-complexity regions as causes of the redundancy observed. The "ndings are related to numerical and biochemical experiments with random polypeptides.


📜 SIMILAR VOLUMES


Information Content of Individual Geneti
✍ T.D. Schneider 📂 Article 📅 1997 🏛 Elsevier Science 🌐 English ⚖ 272 KB

Related genetic sequences having a common function can be described by Shannon's information measure and depicted graphically by a sequence logo. Though useful for many purposes, sequence logos only show the average sequence conservation, and inferring the conservation for individual sequences is di

Predicting protein structure using only
✍ Kevin Karplus; Christian Barrett; Melissa Cline; Mark Diekhans; Leslie Grate; Ri 📂 Article 📅 1999 🏛 John Wiley and Sons 🌐 English ⚖ 111 KB 👁 2 views

This paper presents results of blind predictions submitted to the CASP3 protein structure prediction experiment. We made predictions using the SAM-T98 method, an iterative hidden Markov model-based method for constructing protein family profiles. The method is purely sequencebased, using no structur

Sequence information required for bacter
✍ Spencer A. Benson 📂 Article 📅 1985 🏛 John Wiley and Sons 🌐 English ⚖ 699 KB

Isolation of a human anti-haemophilic factor IXcDNA clone using a unique 52-base synthetic oligonucleotide probe deduced from the aminoacid sequence of bovine factor IX. Nucl. Acids Res. 11, 2325-2335. 6 MATTEI, J. F., MATTEI, M. G., AUMERAS, C.,AUGER, M. &GIRAUD, F. (1981). X-linked mental retardat

Deriving the phylogenetic information fr
✍ Shih-Hau Chiu; Chien-Chi Chen; Gwo-Fang Yuan; Thy-Hou Lin 📂 Article 📅 2010 🏛 John Wiley and Sons 🌐 English ⚖ 185 KB

## Abstract The evolutionary relationships of organisms are traditionally delineated by the alignment‐based methods using some DNA or protein sequences. In the post‐genome era, the phylogenetics of life could be inferred from many sources such as genomic features, not just from comparison of one or

Information-Theoretic Analysis of Protei
✍ YUDONG CAI; C.T.J. DODSON; ANDREW J. DOIG; OLAF WOLKENHAUER 📂 Article 📅 2002 🏛 Elsevier Science 🌐 English ⚖ 408 KB

We analyse for each of 20 amino acids X the statistics of spacings between consecutive occurrences of X within the well-characterized Saccharomyces cerevisiae genome. The occurrences of amino acids may exhibit near random, clustered or smoothed out behaviour, like one-dimensional stochastic processe