๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

The importance of larger data sets for protein secondary structure prediction with neural networks

โœ Scribed by John-Marc Chandonia; Martin Karplus


Publisher
Cold Spring Harbor Laboratory Press
Year
2008
Tongue
English
Weight
737 KB
Volume
5
Category
Article
ISSN
0961-8368

No coin nor oath required. For personal study only.

โœฆ Synopsis


Abstract

A neural network algorithm is applied to secondary structure and structural class prediction for a database of 318 nonhomologous protein chains. Significant improvement in accuracy is obtained as compared with performance on smaller databases. A systematic study of the effects of network topology shows that, for the larger database, better results are obtained with more units in the hidden layer. In a 32โ€fold cross validated test, secondary structure prediction accuracy is 67.0%, relative to 62.6% obtained previously, without any evolutionary information on the sequence. Introduction of sequence profiles increases this value to 72.9%, suggesting that the two types of information are essentially independent. Tertiary structural class is predicted with 80.2% accuracy, relative to 73.9% obtained previously. The use of a larger database is facilitated by the introduction of a scaled conjugate gradient algorithm for optimizing the neural network. This algorithm is about 10โ€“20 times as fast as the standard steepest descent algorithm.


๐Ÿ“œ SIMILAR VOLUMES