✦ LIBER ✦

Information Content of Protein Sequences

✍ Scribed by OLAF WEISS; MIGUEL A JIMÉNEZ-MONTAÑO; HANSPETER HERZEL

Publisher: Elsevier Science
Year: 2000
Tongue: English
Weight: 153 KB
Volume: 206
Category: Article
ISSN: 0022-5193
DOI: 10.1006/jtbi.2000.2138

No coin nor oath required. For personal study only.

✦ Synopsis

The complexity of large sets of non-redundant protein sequences is measured. This is done by estimating the Shannon entropy as well as applying compression algorithms to estimate the algorithmic complexity. The estimators are also applied to randomly generated surrogates of the protein data. Our results show that proteins are fairly close to random sequences. The entropy reduction due to correlations is only about 1%. However, precise estimations of the entropy of the source are not possible due to "nite sample e!ects. Compression algorithms also indicate that the redundancy is in the order of 1%. These results con"rm the idea that protein sequences can be regarded as slightly edited random strings. We discuss secondary structure and low-complexity regions as causes of the redundancy observed. The "ndings are related to numerical and biochemical experiments with random polypeptides.

📜 SIMILAR VOLUMES

Information Content of Individual Geneti

Information Content of Individual Genetic Sequences

✍ T.D. Schneider 📂 Article 📅 1997 🏛 Elsevier Science 🌐 English ⚖ 272 KB

Related genetic sequences having a common function can be described by Shannon's information measure and depicted graphically by a sequence logo. Though useful for many purposes, sequence logos only show the average sequence conservation, and inferring the conservation for individual sequences is di

Predicting protein structure using only

Predicting protein structure using only sequence information

✍ Kevin Karplus; Christian Barrett; Melissa Cline; Mark Diekhans; Leslie Grate; Ri 📂 Article 📅 1999 🏛 John Wiley and Sons 🌐 English ⚖ 111 KB 👁 2 views

This paper presents results of blind predictions submitted to the CASP3 protein structure prediction experiment. We made predictions using the SAM-T98 method, an iterative hidden Markov model-based method for constructing protein family profiles. The method is purely sequencebased, using no structur

Sequence information required for bacter

Sequence information required for bacterial protein export

✍ Spencer A. Benson 📂 Article 📅 1985 🏛 John Wiley and Sons 🌐 English ⚖ 699 KB

Isolation of a human anti-haemophilic factor IXcDNA clone using a unique 52-base synthetic oligonucleotide probe deduced from the aminoacid sequence of bovine factor IX. Nucl. Acids Res. 11, 2325-2335. 6 MATTEI, J. F., MATTEI, M. G., AUMERAS, C.,AUGER, M. &GIRAUD, F. (1981). X-linked mental retardat

Information Content and Free Energy in D

Information Content and Free Energy in DNA–Protein Interactions

✍ Gary D. Stormo 📂 Article 📅 1998 🏛 Elsevier Science 🌐 English ⚖ 135 KB

Deriving the phylogenetic information fr

Deriving the phylogenetic information from some physicochemical properties of protein sequences computed

✍ Shih-Hau Chiu; Chien-Chi Chen; Gwo-Fang Yuan; Thy-Hou Lin 📂 Article 📅 2010 🏛 John Wiley and Sons 🌐 English ⚖ 185 KB

## Abstract The evolutionary relationships of organisms are traditionally delineated by the alignment‐based methods using some DNA or protein sequences. In the post‐genome era, the phylogenetics of life could be inferred from many sources such as genomic features, not just from comparison of one or

Information-Theoretic Analysis of Protei

Information-Theoretic Analysis of Protein Sequences Shows that Amino Acids Self-cluster

✍ YUDONG CAI; C.T.J. DODSON; ANDREW J. DOIG; OLAF WOLKENHAUER 📂 Article 📅 2002 🏛 Elsevier Science 🌐 English ⚖ 408 KB

We analyse for each of 20 amino acids X the statistics of spacings between consecutive occurrences of X within the well-characterized Saccharomyces cerevisiae genome. The occurrences of amino acids may exhibit near random, clustered or smoothed out behaviour, like one-dimensional stochastic processe