𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Algorithmic complexity of protein identification: combinatorics of weighted strings

✍ Scribed by Mark Cieliebak; Thomas Erlebach; Zsuzsanna Lipták; Jens Stoye; Emo Welzl


Publisher
Elsevier Science
Year
2004
Tongue
English
Weight
319 KB
Volume
137
Category
Article
ISSN
0166-218X

No coin nor oath required. For personal study only.

✦ Synopsis


We investigate a problem which arises in computational biology: Given a constant-size alphabet A with a weight function : A → N, ÿnd an e cient data structure and query algorithm solving the following problem: For a string over A and a weight M ∈ N, decide whether contains a substring with weight M , where the weight of a string is the sum of the weights of its letters (ONE-STRING MASS FINDING PROBLEM). If the answer is yes, then we may in addition require a witness, i.e., indices i 6 j such that the substring beginning at position i and ending at position j has weight M . We allow preprocessing of the string and measure e ciency in two parameters: storage space required for the preprocessed data and running time of the query algorithm for given M . We are interested in data structures and algorithms requiring subquadratic storage space and sublinear query time, where we measure the input size as the length n of the input string . Among others, we present two non-trivial e cient algorithms: LOOKUP solves the problem with O(n) storage space and O(n=log n) time; INTERVAL solves the problem for binary alphabets with O(n) storage space in O(log n) query time. We introduce other variants of the problem and sketch how our algorithms may be extended for these variants. Finally, we discuss combinatorial properties of weighted strings.


📜 SIMILAR VOLUMES


Identification of low molecular weight p
✍ Kan Zhu; Fred R. Miller; Timothy J. Barder; David M. Lubman 📂 Article 📅 2004 🏛 John Wiley and Sons 🌐 English ⚖ 580 KB

## Abstract Proteins with molecular mass (__M__~r~) <20 kDa are often poorly separated in 2‐D sodium dodecyl sulfate polyacrylamide gel electrophoresis. In addition, low‐__M__~r~ proteins may not be readily identified using peptide mass fingerprinting (PMF) owing to the small number of peptides gen

Identification of novel homologues of th
✍ Hans-Peter Braun 📂 Article 📅 1996 🏛 Springer 🌐 English ⚖ 471 KB

Large-scale random cDNA sequencing projects have been started for several organisms and are a valuable tool for the analysis of quantitative and qualitative aspects of gene expression. However, the reliability of the obtained data is limited as most of the clones are only partially analysed on one s