✦ LIBER ✦

Information Content of Individual Genetic Sequences

✍ Scribed by T.D. Schneider

Publisher: Elsevier Science
Year: 1997
Tongue: English
Weight: 272 KB
Volume: 189
Category: Article
ISSN: 0022-5193
DOI: 10.1006/jtbi.1997.0540

No coin nor oath required. For personal study only.

✦ Synopsis

Related genetic sequences having a common function can be described by Shannon's information measure and depicted graphically by a sequence logo. Though useful for many purposes, sequence logos only show the average sequence conservation, and inferring the conservation for individual sequences is difficult. This limitation is overcome by the individual information ( R i) technique described here. The method begins by generating a weight matrix from the frequencies of each nucleotide or amino acid at each position of the aligned sequences. This matrix is then applied to the sequences themselves to determine the sequence conservation of each individual sequence. The matrix is unique because the average of these assignments is the total sequence conservation, ad there is only one way to construct such a matrix. For binding sites on polynucleotides, the weight matrix has a natural cut off that distinguishes functional sequences from other sequences. R i values are on an absolute scale measured in bits of information so the conservation of different biological functions can be compared with one another. The matrix can be used to rank-order the sequences, to search for new sequences, to compare sequences to other quantitative data such as binding energy or distance between binding sites, to distinguish mutations from polymorphisms, to design sequences of a given strength, and to detect errors in databases. The R i method has been used to identify previously undescribed but experimentally verified DNA binding sites. The individual information distribution was determined for E. coli ribosome binding sites, bacterial Fis binding sites, and human donor and acceptor splice junctions, among others. The distributions demonstrate clearly that the consensus sequence is highly unusual, and hence is a poor method to describe naturally occurring binding sites.

📜 SIMILAR VOLUMES

Information Content of Protein Sequences

✍ OLAF WEISS; MIGUEL A JIMÉNEZ-MONTAÑO; HANSPETER HERZEL 📂 Article 📅 2000 🏛 Elsevier Science 🌐 English ⚖ 153 KB

The complexity of large sets of non-redundant protein sequences is measured. This is done by estimating the Shannon entropy as well as applying compression algorithms to estimate the algorithmic complexity. The estimators are also applied to randomly generated surrogates of the protein data. Our res

On the information content of the geneti

On the information content of the genetic code

✍ T. Alvager; G. Graham; R. Hilleke; D. Hutchison; J. Westgard 📂 Article 📅 1989 🏛 Elsevier Science 🌐 English ⚖ 555 KB

Maximum likelihood genetic sequence reco

Maximum likelihood genetic sequence reconstruction from oligo content

✍ Jane N. Hagstrom; Ray Hagstrom; Ross Overbeek; Morgan Price; Linus Schrage 📂 Article 📅 1994 🏛 John Wiley and Sons 🌐 English ⚖ 549 KB

The information content of phase-known m

The information content of phase-known matings for ordering genetic loci

✍ D. Timothy Bishop; D. C. Rao 📂 Article 📅 1985 🏛 John Wiley and Sons 🌐 English ⚖ 726 KB

This analysis evaluates the amount of linkage information that can be obtained for ordering three loci. The information contents of data collected on three independent pairwise samples of two loci, and data obtained jointly from three loci in a single set of individuals are compared. The level of su

Evolution of Genetic Information Flow Fr

Evolution of Genetic Information Flow From the Viewpoint of Protein Sequence Similarity

✍ Satoshi Fukuchi; Tetsuya Okayama; Jinya Otsuka 📂 Article 📅 1994 🏛 Elsevier Science 🌐 English ⚖ 964 KB

As a course of inquiry into the evolution of genetic information flow, similarity relations of amino acid sequences between the proteins involved in translation, transcription and replication are investigated. The sequence data of these proteins are mostly accumulated from Escherichia coli, and the

A Statistical Feature of Genetic Sequenc

A Statistical Feature of Genetic Sequences

✍ J. Andrés Christen; José-Leonel Torres; Julio Barrera 📂 Article 📅 1998 🏛 John Wiley and Sons 🌐 English ⚖ 153 KB