𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Algorithms of nonlinear document clustering based on fuzzy multiset model

✍ Scribed by Kiyotaka Mizutani; Ryo Inokuchi; Sadaaki Miyamoto


Publisher
John Wiley and Sons
Year
2008
Tongue
English
Weight
553 KB
Volume
23
Category
Article
ISSN
0884-8173

No coin nor oath required. For personal study only.

✦ Synopsis


Fuzzy multiset is applicable as a model of information retrieval because it has the mathematical structure that expresses the number and the degree of attribution of an element simultaneously. Therefore, fuzzy multisets can be used also as a suitable model for document clustering. This paper aims at developing clustering algorithms based on a fuzzy multiset model for document clustering. The standard proximity measure of the cosine correlation is generalized in the multiset model, and two nonlinear clustering techniques are applied to the existing clustering methods. One introduces a variable for controlling cluster volume sizes; the other one is a kernel trick used in support vector machines. Moreover, clustering by competitive learning is also studied. When the kernel trick has been used the classification configuration of data in a high-dimensional feature space is visualized by self-organizing maps. Two numerical examples, which use an artificial data and real document data, are shown and effects of the proposed methods are discussed.