Deriving the phylogenetic information from some physicochemical properties of protein sequences computed
✍ Scribed by Shih-Hau Chiu; Chien-Chi Chen; Gwo-Fang Yuan; Thy-Hou Lin
- Publisher
- John Wiley and Sons
- Year
- 2010
- Tongue
- English
- Weight
- 185 KB
- Volume
- 32
- Category
- Article
- ISSN
- 0192-8651
No coin nor oath required. For personal study only.
✦ Synopsis
Abstract
The evolutionary relationships of organisms are traditionally delineated by the alignment‐based methods using some DNA or protein sequences. In the post‐genome era, the phylogenetics of life could be inferred from many sources such as genomic features, not just from comparison of one or several genes. To investigate the possibility that the physicochemical properties of protein sequences might reflect the phylogenetic ones, an alignment‐free method using a support vector machine (SVM) classifier is implemented to establish the phylogenetic relationships between some protein sequences. There are two types of datasets, namely, the “Enzymatic” (assigned by an EC accession) and “Proteins” used to train the SVM classifiers. By computing the F‐score for feature selection, we find that the classification accuracies of trained SVM classifiers could be significantly enhanced to 84% and 80%, respectively, for the enzymatic and “proteins” datasets classified if the protein sequences are represented with some top 255 features selected. These show that some physicochemical features of amino acid sequences selected are sufficient for inferring the phylogenetic properties of the protein sequences. Moreover, we find that the selected physicochemical features appear to correlate with the physiological characteristic of the taxonomic classes classified. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010