This Conference Brings Together Researchers And Practitioners And Focuses On New Developments In Knowledge Discovery And Data Mining. The Challenge Of Extracting Knowledge From Data Is An Area Of Common Interest To Researchers In Several Fields, Including Statistics, Databases, Pattern Recognition,
[ACM Press the ninth ACM SIGKDD international conference - Washington, D.C. (2003.08.24-2003.08.27)] Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '03 - Efficient decision tree construction on streaming data
โ Scribed by Jin, Rouming; Agrawal, Gagan
- Book ID
- 126219337
- Publisher
- ACM Press
- Year
- 2003
- Tongue
- English
- Weight
- 161 KB
- Category
- Article
- ISBN-13
- 9781581137378
No coin nor oath required. For personal study only.
โฆ Synopsis
Decision tree construction is a well studied problem in data mining. Recently, there has been much interest in mining streaming data. Domingos and Hulten have presented a one-pass algorithm for decision tree construction. Their work uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed.In this paper, we revisit this problem. We make the following two contributions: 1) We present a numerical interval pruning (NIP) approach for efficiently processing numerical attributes. Our results show an average of 39% reduction in execution times. 2) We exploit the properties of the gain function entropy (and gini) to reduce the sample size required for obtaining a given bound on the accuracy. Our experimental results show a 37% reduction in the number of data instances required.
๐ SIMILAR VOLUMES
This Conference Brings Together Researchers And Practitioners And Focuses On New Developments In Knowledge Discovery And Data Mining. The Challenge Of Extracting Knowledge From Data Is An Area Of Common Interest To Researchers In Several Fields, Including Statistics, Databases, Pattern Recognition,
This Conference Brings Together Researchers And Practitioners And Focuses On New Developments In Knowledge Discovery And Data Mining. The Challenge Of Extracting Knowledge From Data Is An Area Of Common Interest To Researchers In Several Fields, Including Statistics, Databases, Pattern Recognition,