Optimal QSAR analysis of the carcinogenic activity of drugs by correlation ranking and genetic algorithm-based PCR
✍ Scribed by Bahram Hemmateenejad
- Publisher
- John Wiley and Sons
- Year
- 2004
- Tongue
- English
- Weight
- 414 KB
- Volume
- 18
- Category
- Article
- ISSN
- 0886-9383
- DOI
- 10.1002/cem.891
No coin nor oath required. For personal study only.
✦ Synopsis
Abstract
The major problem associated with principal component regression (PCR), especially in QSAR studies, is that this model extracts the eigenvectors solely from the matrix of predictor variables and therefore they might not have an essentially good relationship with the predicted variable. This paper describes the application of PCR to model the structure–carcinogenic activity of drugs. To obtain the optimal model, correlation ranking and a genetic algorithm were employed for selecting the best set of principal components (PCs). A large data set containing 735 carcinogenic activities and 1355 descriptors was used. Two cross‐validation procedures (leave‐many‐out and ν‐fold cross‐validation) and the hold‐out‐a‐test‐sample (HOTS) method were used to validate the models. It was found that introduction of PCs by the conventional eigenvalue ranking procedure did not produce the perfect model. Instead, factor selection by correlation ranking and genetic algorithm produced good models of similar quality. The models could explain more than 80% of the variances in carcinogenic activity. Copyright © 2005 John Wiley & Sons, Ltd.