The method of -grams in large-scale clustering of DNA texts
β Scribed by Z. Volkovich; V. Kirzhner; A. Bolshoy; E. Nevo; A. Korol
- Publisher
- Elsevier Science
- Year
- 2005
- Tongue
- English
- Weight
- 251 KB
- Volume
- 38
- Category
- Article
- ISSN
- 0031-3203
No coin nor oath required. For personal study only.
β¦ Synopsis
This paper is devoted to the techniques of clustering of texts based on the comparison of vocabularies of N-grams. In contrast to the regular N-grams approach, the proposed N-grams method is based on calculation of imperfect occurrences of N-grams in a text up to a number of mismatched strings. We demonstrated that such an approach essentially improves the resolving capacity of the N-grams method for DNA texts. Additionally, we discuss a mutual usage scheme of different clustering technique types to verify the partition quality.
π SIMILAR VOLUMES
## Abstract A general model for the largeβscale, timeβindependent structure of duplex DNA is developed based on elastic considerations. The general conditions of elastic equilibrium are given. These equations are solved for the equilibrium shape of stressed duplex DNA, based on the assumption that
A method of full-text scanning for matches in a large dictionary of keywords is described, suitable for Selective Dissemination of Information (SDI). The method is applicable to large dictionaries (say 10 4 to 10 5 entries), and to arbitrary byte streams for both patterns and data samples. The appro