๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

[ACM Press the 12th ACM SIGKDD international conference - Philadelphia, PA, USA (2006.08.20-2006.08.23)] Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '06 - Outlier detection by active learning

โœ Scribed by Abe, Naoki; Zadrozny, Bianca; Langford, John


Book ID
124094445
Publisher
ACM Press
Year
2006
Weight
134 KB
Category
Article
ISBN-13
9781595933393

No coin nor oath required. For personal study only.

โœฆ Synopsis


Most existing approaches to outlier detection are based on density estimation methods. There are two notable issues with these methods: one is the lack of explanation for outlier flagging decisions, and the other is the relatively high computational requirement. In this paper, we present a novel approach to outlier detection based on classification, in an attempt to address both of these issues. Our approach is based on two key ideas. First, we present a simple reduction of outlier detection to classification, via a procedure that involves applying classification to a labeled data set containing artificially generated examples that play the role of potential outliers. Once the task has been reduced to classification, we then invoke a selective sampling mechanism based on active learning to the reduced classification problem. We empirically evaluate the proposed approach using a number of data sets, and find that our method is superior to other methods based on the same reduction to classification, but using standard classification methods. We also show that it is competitive to the state-of-the-art outlier detection methods in the literature based on density estimation, while significantly improving the computational complexity and explanatory power.


๐Ÿ“œ SIMILAR VOLUMES


[ACM Press the 12th ACM SIGKDD internati
โœ Carvalho, Vitor R.; Cohen, William W. ๐Ÿ“‚ Article ๐Ÿ“… 2006 ๐Ÿ› ACM Press โš– 725 KB

To learn concepts over massive data streams, it is essential to design inference and learning methods that operate in real time with limited memory. Online learning methods such as perceptron or Winnow are naturally suited to stream processing; however, in practice multiple passes over the same trai