𝔖 Bobbio Scriptorium
✦   LIBER   ✦

A strategy for finding relevant clusters; with an application to microarray data

✍ Scribed by Ingunn Berget; Bjørn-Helge Mevik; Heidi Vebø; Tormod Næs


Publisher
John Wiley and Sons
Year
2005
Tongue
English
Weight
341 KB
Volume
19
Category
Article
ISSN
0886-9383

No coin nor oath required. For personal study only.

✦ Synopsis


Abstract

Cluster analysis is a helpful tool for explorative analysis of large and complex data. Most clustering methods will, however, find clusters also in random data. An important aspect of cluster analysis is therefore to distinguish real and artificial clusters, as this will make interpretation of the clusters easier. In some cases, certain types of clusters are more interesting than others. When working with gene expression data, examples of such clusters are gene clusters with high between‐sample variability or clusters with a certain expression profile. Here we present a strategy with the ability to search for such clusters. The clustering is done sequentially. For each sequence, the data is separated into ‘interesting’ and ‘rest’ using the fuzzy c‐means algorithm with noise clustering. The interesting cluster is defined by adding a penalty function to the usual clustering criterion. The penalty function is constructed in such a way that clusters without the interesting properties are given a high penalty. The strategy is presented in a general frame, and can be adjusted by defining different criteria for each type of cluster that is of interest. The methodology is presented and demonstrated in the context of microarray gene expression analysis, using real and simulated data, but can be used for any type of data where cluster analysis may be a helpful tool. Copyright © 2006 John Wiley & Sons, Ltd.


📜 SIMILAR VOLUMES


Dating and forecasting turning points by
✍ Sylvia Kaufmann 📂 Article 📅 2009 🏛 John Wiley and Sons 🌐 English ⚖ 360 KB

## Abstract The information contained in a large panel dataset is used to date historical turning points and to forecast future ones. We estimate groups of series with similar time series dynamics and link the groups with a dynamic structure. The dynamic structure identifies a group of leading and