𝔖 Bobbio Scriptorium
✦   LIBER   ✦

The method of -grams in large-scale clustering of DNA texts

✍ Scribed by Z. Volkovich; V. Kirzhner; A. Bolshoy; E. Nevo; A. Korol


Publisher
Elsevier Science
Year
2005
Tongue
English
Weight
251 KB
Volume
38
Category
Article
ISSN
0031-3203

No coin nor oath required. For personal study only.

✦ Synopsis


This paper is devoted to the techniques of clustering of texts based on the comparison of vocabularies of N-grams. In contrast to the regular N-grams approach, the proposed N-grams method is based on calculation of imperfect occurrences of N-grams in a text up to a number of mismatched strings. We demonstrated that such an approach essentially improves the resolving capacity of the N-grams method for DNA texts. Additionally, we discuss a mutual usage scheme of different clustering technique types to verify the partition quality.


πŸ“œ SIMILAR VOLUMES


An elastic model of the large-scale stru
✍ Craig J. Benham πŸ“‚ Article πŸ“… 1979 πŸ› Wiley (John Wiley & Sons) 🌐 English βš– 749 KB

## Abstract A general model for the large‐scale, time‐independent structure of duplex DNA is developed based on elastic considerations. The general conditions of elastic equilibrium are given. These equations are solved for the equilibrium shape of stressed duplex DNA, based on the assumption that

An n-gram hash and skip algorithm for fi
✍ Jonathan D. Cohen πŸ“‚ Article πŸ“… 1998 πŸ› John Wiley and Sons 🌐 English βš– 407 KB πŸ‘ 1 views

A method of full-text scanning for matches in a large dictionary of keywords is described, suitable for Selective Dissemination of Information (SDI). The method is applicable to large dictionaries (say 10 4 to 10 5 entries), and to arbitrary byte streams for both patterns and data samples. The appro