The importance of larger data sets for protein secondary structure prediction with neural networks
โ Scribed by John-Marc Chandonia; Martin Karplus
- Publisher
- Cold Spring Harbor Laboratory Press
- Year
- 2008
- Tongue
- English
- Weight
- 737 KB
- Volume
- 5
- Category
- Article
- ISSN
- 0961-8368
No coin nor oath required. For personal study only.
โฆ Synopsis
Abstract
A neural network algorithm is applied to secondary structure and structural class prediction for a database of 318 nonhomologous protein chains. Significant improvement in accuracy is obtained as compared with performance on smaller databases. A systematic study of the effects of network topology shows that, for the larger database, better results are obtained with more units in the hidden layer. In a 32โfold cross validated test, secondary structure prediction accuracy is 67.0%, relative to 62.6% obtained previously, without any evolutionary information on the sequence. Introduction of sequence profiles increases this value to 72.9%, suggesting that the two types of information are essentially independent. Tertiary structural class is predicted with 80.2% accuracy, relative to 73.9% obtained previously. The use of a larger database is facilitated by the introduction of a scaled conjugate gradient algorithm for optimizing the neural network. This algorithm is about 10โ20 times as fast as the standard steepest descent algorithm.
๐ SIMILAR VOLUMES