𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Correlations in Protein Sequences and Property Codes

✍ Scribed by Olaf Weiss; Hanspeter Herzel


Publisher
Elsevier Science
Year
1998
Tongue
English
Weight
280 KB
Volume
190
Category
Article
ISSN
0022-5193

No coin nor oath required. For personal study only.

✦ Synopsis


Correlation functions in large sets of non-homologous protein sequences are analysed. Finite size corrections are applied and fluctuations are estimated. As symbol sequences have to be mapped to sequences of numbers to calculate correlation functions, several property codes are tested as such mappings. We found hydrophobicity autocorrelation functions to be strongly oscillating. Another strong signal is the monotonously decaying alpha-helix propensity autocorrelation function. Furthermore, we detected signals corresponding to an alteration of positively and negatively charged residues at a distance of 3-4 amino acids. To look beyond the property codes gained by the methods of physical chemistry, mappings yielding a strong correlation signal are sought for using a Monte Carlo simulation. The mappings leading to strong signals are found to be related to hydrophobicity of alpha-helix propensity. A cluster analysis of the top scoring mappings leads to two novel property codes. These two property codes are gained from sequence data only. They turn out to be similar to known property codes for hydrophobicity or polarity.


πŸ“œ SIMILAR VOLUMES


Correlation of sequence and tertiary str
✍ Gordon M. Crippen πŸ“‚ Article πŸ“… 1977 πŸ› Wiley (John Wiley & Sons) 🌐 English βš– 711 KB

## Abstract The x‐ray crystal structures of 19 selected proteins are examined empirically for correlations between the amino acid sequence and long‐range, __tertiary__ conformation. There is clear evidence for preferential associations of certain types of amino acids, particularly among the hydroph

Analysis on the Distribution of Bases in
✍ Chun-Ting Zhang; Yong Zhan πŸ“‚ Article πŸ“… 1994 πŸ› Elsevier Science 🌐 English βš– 254 KB

The occurrence frequencies of bases A, C, G and T, denoted by \(a, c, g\) and \(t\), respectively, in 1487 human: protein coding sequences have been calculated and analyzed. The analysis has been performed by a diagrammatic method presented recently, in which each coding sequence is represented by a

Distributions of Dimeric Tandem Repeats
✍ NIKOLAY V. DOKHOLYAN; SERGEY V. BULDYREV; SHLOMO HAVLIN; H.EUGENE STANLEY πŸ“‚ Article πŸ“… 2000 πŸ› Elsevier Science 🌐 English βš– 221 KB

We study the length distribution functions for the 16 possible distinct dimeric tandem repeats in DNA sequences of diverse taxonomic partitions of GenBank (known human and mouse genomes, and complete genomes of Caenorhabditis elegans and yeast). For coding DNA, we find that all 16 distribution funct