Abdur Chowdhury ... [et Al.], Editors ; Sponsored By Acm Special Interest Group On Information Retrieval (sigir), And Gesellschaft FΓΌr Informatik E.v (gi). Acm Order Number: 605050--p. Ii. Includes Bibliographical References And Author Index. Also Issued Online.
[ACM Press the 14th ACM international conference - Bremen, Germany (2005.10.31-2005.11.05)] Proceedings of the 14th ACM international conference on Information and knowledge management - CIKM '05 - Towards estimating the number of distinct value combinations for a set of attributes
β Scribed by Yu, Xiaohui; Zuzarte, Calisto; Sevcik, Kenneth C.
- Book ID
- 121859085
- Publisher
- ACM Press
- Year
- 2005
- Tongue
- English
- Weight
- 263 KB
- Category
- Article
- ISBN-13
- 9781595931405
No coin nor oath required. For personal study only.
β¦ Synopsis
Accurately and efficiently estimating the number of distinct values for some attribute(s) or sets of attributes in a data set is of critical importance to many database operations, such as query optimization and approximation query answering. Previous work has focused on the estimation of the number of distinct values for a single attribute and most existing work adopts a data sampling approach. This paper addresses the equally important issue of estimating the number of distinct value combinations for multiple attributes which we call COLSCARD (for COLumn Set CARDinality). It also takes a different approach that uses existing statistical information (e.g., histograms) available on the individual attributes to assist estimation. We start with cases where exact frequency information on individual attributes is available, and present a pair of lower and upper bounds on COLSCARD that are consistent with the available information, as well as an estimator of COLSCARD based on probability. We then proceed to study the case where only partial information (in the form of histograms) is available on individual attributes, and show how the proposed estimator can be adapted to this case. We consider two types of widely used histograms and show how they can be constructed in order to obtain optimal approximation. An experimental evaluation of the proposed estimation method on synthetic as well as two real data sets is provided.
π SIMILAR VOLUMES
Abdur Chowdhury ... [et Al.], Editors ; Sponsored By Acm Special Interest Group On Information Retrieval (sigir), And Gesellschaft FΓΌr Informatik E.v (gi). Acm Order Number: 605050--p. Ii. Includes Bibliographical References And Author Index. Also Issued Online.
Abdur Chowdhury ... [et Al.], Editors ; Sponsored By Acm Special Interest Group On Information Retrieval (sigir), And Gesellschaft FΓΌr Informatik E.v (gi). Acm Order Number: 605050--p. Ii. Includes Bibliographical References And Author Index. Also Issued Online.
Abdur Chowdhury ... [et Al.], Editors ; Sponsored By Acm Special Interest Group On Information Retrieval (sigir), And Gesellschaft FΓΌr Informatik E.v (gi). Acm Order Number: 605050--p. Ii. Includes Bibliographical References And Author Index. Also Issued Online.
Abdur Chowdhury ... [et Al.], Editors ; Sponsored By Acm Special Interest Group On Information Retrieval (sigir), And Gesellschaft FΓΌr Informatik E.v (gi). Acm Order Number: 605050--p. Ii. Includes Bibliographical References And Author Index. Also Issued Online.
Abdur Chowdhury ... [et Al.], Editors ; Sponsored By Acm Special Interest Group On Information Retrieval (sigir), And Gesellschaft FΓΌr Informatik E.v (gi). Acm Order Number: 605050--p. Ii. Includes Bibliographical References And Author Index. Also Issued Online.