๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

[ACM Press the ninth ACM SIGKDD international conference - Washington, D.C. (2003.08.24-2003.08.27)] Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '03 - Mining distance-based outliers in near linear time with randomization and a simple pruning rule

โœ Scribed by Bay, Stephen D.; Schwabacher, Mark


Book ID
120625942
Publisher
ACM Press
Year
2003
Tongue
English
Weight
216 KB
Category
Article
ISBN-13
9781581137378

No coin nor oath required. For personal study only.

โœฆ Synopsis


Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.


๐Ÿ“œ SIMILAR VOLUMES


[ACM Press the ninth ACM SIGKDD internat
โœ Hsu, Wynne; Dai, Jing; Lee, Mong Li ๐Ÿ“‚ Article ๐Ÿ“… 2003 ๐Ÿ› ACM Press ๐ŸŒ English โš– 398 KB

This Conference Brings Together Researchers And Practitioners And Focuses On New Developments In Knowledge Discovery And Data Mining. The Challenge Of Extracting Knowledge From Data Is An Area Of Common Interest To Researchers In Several Fields, Including Statistics, Databases, Pattern Recognition,

[ACM Press the ninth ACM SIGKDD internat
โœ Last, Mark; Friedman, Menahem; Kandel, Abraham ๐Ÿ“‚ Article ๐Ÿ“… 2003 ๐Ÿ› ACM Press ๐ŸŒ English โš– 289 KB

This Conference Brings Together Researchers And Practitioners And Focuses On New Developments In Knowledge Discovery And Data Mining. The Challenge Of Extracting Knowledge From Data Is An Area Of Common Interest To Researchers In Several Fields, Including Statistics, Databases, Pattern Recognition,