𝔖 Bobbio Scriptorium
✦   LIBER   ✦

A clustering procedure based on the comparison between the k nearest neighbors graph and the minimal spanning tree

✍ Scribed by José Marı́a González-Barrios; Adolfo J. Quiroz


Publisher
Elsevier Science
Year
2003
Tongue
English
Weight
250 KB
Volume
62
Category
Article
ISSN
0167-7152

No coin nor oath required. For personal study only.

✦ Synopsis


We present a procedure for the identiÿcation of clusters in multivariate data sets, based on the comparison between the k nearest neighbors graph, G k , and the minimal spanning tree, MST. Our key statistic is the random quantity k := the smallest k such that G k contains the MST. Under regularity assumptions, we show that for i.i.d. data from a density on R d with compact support having one connected component, k =O Pr (log n), where n denotes sample size, a bound that seems to be sharp, according to simulations. This leads to a consistent test for the identiÿcation of crisp clusters. We illustrate the use of our procedure with an example.