𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Gauss-integral based representation of protein structure for predicting the fold class from the sequence

✍ Scribed by Bjørn G. Nielsen; Peter Røgen; Henrik G. Bohr


Publisher
Elsevier Science
Year
2006
Tongue
English
Weight
558 KB
Volume
43
Category
Article
ISSN
0895-7177

No coin nor oath required. For personal study only.

✦ Synopsis


A representative subset of protein chains were selected from the CATH 2.4 database [C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B. Swindells, J.M. Thornton, CATH-a hierarchic classification of protein domain structures, Structure 5 (8) (1997) 1093-1108], and were used for training a feed-forward neural network in order to predict protein fold classes by using as input the dipeptide frequency matrix and as output a novel representation of the protein chains in R 30 space, based on knot invariant values [P. Røgen, B. Fain, Automatic classification of protein structure by using Gauss integrals, Proceedings of the National Academy of Sciences of the United States of America 100 (1) (2003) 119-124; P. Røgen, H.G. Bohr, A new family of global protein shape descriptors, Mathematical Biosciences 182 (2) (2003) 167-181].

In the general case when excluding singletons (proteins representing a topology or a sequence homology as unique members of these sets), the success rates for the predictions were 77% for class level, 60% for architecture, and 48% for topology. The total number of fold classes that are included in the present data set (∼500) is ten times that which has been reported in earlier attempts, so this result represents an improvement on previous work (reporting on a few handpicked folds). Furthermore, distance analysis of the network outputs resulting from singletons shows that it is possible to detect novel topologies with very high confidence (∼85%), and the network can in these cases be used as a sorting mechanism that identifies sequences which might need special attention. Also, a direct measure of prediction confidence may be obtained from such distance analysis.


📜 SIMILAR VOLUMES


Prediction of protein structural class f
✍ Petr Klein; Charles Delisi 📂 Article 📅 1986 🏛 Wiley (John Wiley & Sons) 🌐 English ⚖ 781 KB

The multidimensional statistical technique of discriminant analysis is used to allocate amino acid sequences to one of four secondary structural classes: high a content, high / 3 content, mixed a and @, low content of ordered structure. Discrimination is based on four attributes: estimates of percen

Multiple classifier integration for the
✍ Lei Chen; Lin Lu; Kairui Feng; Wenjin Li; Jie Song; Lulu Zheng; Youlang Yuan; Zh 📂 Article 📅 2009 🏛 John Wiley and Sons 🌐 English ⚖ 112 KB

## Abstract Supervised classifiers, such as artificial neural network, partition trees, and support vector machines, are often used for the prediction and analysis of biological data. However, choosing an appropriate classifier is not straightforward because each classifier has its own strengths an