✦ LIBER ✦

Semisupervised learning using feature selection based on maximum density subgraphs

✍ Scribed by Yoshiyuki Nakatani; Kuangyi Zhu; Kuniaki Uehara

Book ID: 104591302
Publisher: John Wiley and Sons
Year: 2007
Tongue: English
Weight: 511 KB
Volume: 38
Category: Article
ISSN: 0882-1666
DOI: 10.1002/scj.20757

No coin nor oath required. For personal study only.

✦ Synopsis

Abstract

In machine learning tasks on large‐scale datasets, the labeled data essential to the classification are not always sufficient, which degrades the learning accuracy. Meanwhile, unlabeled data are always abundant. Hence, semisupervised learning which uses both unlabeled and labeled data to improve the learning accuracy is currently of great interest. In this paper, we use a graph to represent the underlying distribution of both labeled and unlabeled data and split it by using multiway cut to classify unlabeled data. Additionally, we propose a graph‐based feature selection algorithm to improve the learning accuracy of our graph‐based semisupervised learning algorithm. In our algorithm, we first propose an evaluation criterion for the attribute relevance using the graph density. Then, we extract the relevant attribute subset by finding the clique on the graph where each vertex stands for the attribute and each edge stands for the relevance of a feature pair. © 2007 Wiley Periodicals, Inc. Syst Comp Jpn, 38(9): 32–43, 2007; Published online in Wiley InterScience (www.interscience. wiley.com). DOI 10.1002/scj.20757