On spaced seeds for similarity search
β Scribed by Uri Keich; Ming Li; Bin Ma; John Tromp
- Book ID
- 104294282
- Publisher
- Elsevier Science
- Year
- 2004
- Tongue
- English
- Weight
- 235 KB
- Volume
- 138
- Category
- Article
- ISSN
- 0166-218X
No coin nor oath required. For personal study only.
β¦ Synopsis
Genomics studies routinely depend on similarity searches based on the strategy of ΓΏnding short seed matches (contiguous k bases) which are then extended. The particular choice of the seed length, k, is determined by the tradeo between search speed (larger k reduces chance hits) and sensitivity (smaller k ΓΏnds weaker similarities). A novel idea of using a single deterministic optimized spaced seed was introduced in Ma et al. (Bioinformatics (2002) 18) to the above similarity search process and it was empirically demonstrated that the optimal spaced seed quadruples the search speed, without sacriΓΏcing sensitivity. Multiple, randomly spaced patterns, spaced q-grams, and spaced probes were also studied in Califano and Rigoutsos (Technical Report, IBM T.J. Watson Research Center (1995), Burkhardt, K arkk ainen, CPM (2001), and Buhler, Bioinformatics 17 (2001) 419) and in other applications [(RECOMB (1999) 295, RE-COMB (2000) 245)]. They were all found to be better than their contiguous counterparts. In this paper we study some of the theoretical and practical aspects of optimal seeds. In particular we demonstrate that the commonly used contiguous seed is in some sense the worst one, and we o er an algorithmic solution to the problem of ΓΏnding the optimal seed.
π SIMILAR VOLUMES