Related genetic sequences having a common function can be described by Shannon's information measure and depicted graphically by a sequence logo. Though useful for many purposes, sequence logos only show the average sequence conservation, and inferring the conservation for individual sequences is di
Information Content of Protein Sequences
✍ Scribed by OLAF WEISS; MIGUEL A JIMÉNEZ-MONTAÑO; HANSPETER HERZEL
- Publisher
- Elsevier Science
- Year
- 2000
- Tongue
- English
- Weight
- 153 KB
- Volume
- 206
- Category
- Article
- ISSN
- 0022-5193
No coin nor oath required. For personal study only.
✦ Synopsis
The complexity of large sets of non-redundant protein sequences is measured. This is done by estimating the Shannon entropy as well as applying compression algorithms to estimate the algorithmic complexity. The estimators are also applied to randomly generated surrogates of the protein data. Our results show that proteins are fairly close to random sequences. The entropy reduction due to correlations is only about 1%. However, precise estimations of the entropy of the source are not possible due to "nite sample e!ects. Compression algorithms also indicate that the redundancy is in the order of 1%. These results con"rm the idea that protein sequences can be regarded as slightly edited random strings. We discuss secondary structure and low-complexity regions as causes of the redundancy observed. The "ndings are related to numerical and biochemical experiments with random polypeptides.
📜 SIMILAR VOLUMES
This paper presents results of blind predictions submitted to the CASP3 protein structure prediction experiment. We made predictions using the SAM-T98 method, an iterative hidden Markov model-based method for constructing protein family profiles. The method is purely sequencebased, using no structur
Isolation of a human anti-haemophilic factor IXcDNA clone using a unique 52-base synthetic oligonucleotide probe deduced from the aminoacid sequence of bovine factor IX. Nucl. Acids Res. 11, 2325-2335. 6 MATTEI, J. F., MATTEI, M. G., AUMERAS, C.,AUGER, M. &GIRAUD, F. (1981). X-linked mental retardat
## Abstract The evolutionary relationships of organisms are traditionally delineated by the alignment‐based methods using some DNA or protein sequences. In the post‐genome era, the phylogenetics of life could be inferred from many sources such as genomic features, not just from comparison of one or
We analyse for each of 20 amino acids X the statistics of spacings between consecutive occurrences of X within the well-characterized Saccharomyces cerevisiae genome. The occurrences of amino acids may exhibit near random, clustered or smoothed out behaviour, like one-dimensional stochastic processe