𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Reannotation of protein-coding genes based on an improved graphical representation of DNA sequence

✍ Scribed by Jia-Feng Yu; Xiao Sun


Publisher
John Wiley and Sons
Year
2010
Tongue
English
Weight
816 KB
Volume
31
Category
Article
ISSN
0192-8651

No coin nor oath required. For personal study only.

✦ Synopsis


Abstract

Over annotation of protein coding genes is common phenomenon in microbial genomes, the genome of Amsacta moorei entomopoxvirus (AmEPV) is a typical case, because more than 63% of its annotated ORFs are hypothetical. In this article, we propose an improved graphical representation titled I‐TN (improved curve based on trinucleotides) curve, which allows direct inspection of composition and distribution of codons and asymmetric gene structure. This improved graphical representation can also provide convenient tools for genome analysis. From this presentation, 18 variables are exploited as numerical descriptors to represent the specific features of protein coding genes quantitatively, with which we reannotate the protein coding genes in several viral genomes. Using the parameters trained on the experimentally validated genes, all of the 30 experimentally validated genes and 63 putative genes in AmEPV genome are recognized correctly as protein coding, the accuracies of the present method for self‐test and cross‐validation are 100%, respectively. Twenty‐eight annotated hypothetical genes are predicted as noncoding, and then the number of reannotated protein coding genes in AmEPV should be 266 instead of 294 reported in the original annotations. Extending the present method trained in AmEPV to other entomopoxvirus genomes directly, such as Melanoplus sanguinipes entomopoxvirus (MsEPV), all of the 123 annotated function‐known and putative genes are recognized correctly as protein coding, and 17 hypothetical genes are recognized as noncoding. The present method could also be extended to other genomes with or without adaptation of training sets with high accuracy. © 2010 Wiley Periodicals, Inc. J Comput Chem 2010


📜 SIMILAR VOLUMES


Coronavirus phylogeny based on 2D graphi
✍ Bo Liao; Xuyu Xiang; Wen Zhu 📂 Article 📅 2006 🏛 John Wiley and Sons 🌐 English ⚖ 590 KB

## Abstract A novel coronavirus has been identified as the cause of the outbreak of severe acute respiratory syndrome (SARS). Previous phylogenetic analyses based on sequence alignments show that SARS‐CoVs form a new group distantly related to the other three groups of previously characterized coro

Similarity/dissimilarity studies of prot
✍ Yu-Hua Yao; Qi Dai; Ling Li; Xu-Ying Nan; Ping-An He; Yao-Zhou Zhang 📂 Article 📅 2009 🏛 John Wiley and Sons 🌐 English ⚖ 412 KB 👁 1 views

## Abstract A (two‐dimensional) 2D graphical representation of protein sequences based on six physicochemical properties of amino acids is outlined. The numerical characterization of protein graphs is given as descriptors of protein sequences. It is not only useful for comparative study of proteins

Analysis of similarity/dissimilarity of
✍ Yu-hua Yao; Qi Dai; Xu-Ying Nan; Ping-An He; Zuo-Ming Nie; Song-Ping Zhou; Yao-Z 📂 Article 📅 2008 🏛 John Wiley and Sons 🌐 English ⚖ 113 KB 👁 2 views

## Abstract On the basis of a class of 2D graphical representations of DNA sequences, sensitivity analysis has been performed, showing the high‐capability of the proposed representations to take into account small modifications of the DNA sequences. And sensitivity analysis also indicates that the