## Abstract The xβray crystal structures of 19 selected proteins are examined empirically for correlations between the amino acid sequence and longβrange, __tertiary__ conformation. There is clear evidence for preferential associations of certain types of amino acids, particularly among the hydroph
Correlations in Protein Sequences and Property Codes
β Scribed by Olaf Weiss; Hanspeter Herzel
- Publisher
- Elsevier Science
- Year
- 1998
- Tongue
- English
- Weight
- 280 KB
- Volume
- 190
- Category
- Article
- ISSN
- 0022-5193
No coin nor oath required. For personal study only.
β¦ Synopsis
Correlation functions in large sets of non-homologous protein sequences are analysed. Finite size corrections are applied and fluctuations are estimated. As symbol sequences have to be mapped to sequences of numbers to calculate correlation functions, several property codes are tested as such mappings. We found hydrophobicity autocorrelation functions to be strongly oscillating. Another strong signal is the monotonously decaying alpha-helix propensity autocorrelation function. Furthermore, we detected signals corresponding to an alteration of positively and negatively charged residues at a distance of 3-4 amino acids. To look beyond the property codes gained by the methods of physical chemistry, mappings yielding a strong correlation signal are sought for using a Monte Carlo simulation. The mappings leading to strong signals are found to be related to hydrophobicity of alpha-helix propensity. A cluster analysis of the top scoring mappings leads to two novel property codes. These two property codes are gained from sequence data only. They turn out to be similar to known property codes for hydrophobicity or polarity.
π SIMILAR VOLUMES
The occurrence frequencies of bases A, C, G and T, denoted by \(a, c, g\) and \(t\), respectively, in 1487 human: protein coding sequences have been calculated and analyzed. The analysis has been performed by a diagrammatic method presented recently, in which each coding sequence is represented by a
We study the length distribution functions for the 16 possible distinct dimeric tandem repeats in DNA sequences of diverse taxonomic partitions of GenBank (known human and mouse genomes, and complete genomes of Caenorhabditis elegans and yeast). For coding DNA, we find that all 16 distribution funct