In this paper we study the use of a semi-supervised agglomerative hierarchical clustering Ž . ssAHC algorithm to text categorization, which consists of assigning text documents to Ž . Ž . predefined categories. ssAHC is i a clustering algorithm that ii uses a finite design set Ž . Ž . of labeled dat
Identification of user sessions with hierarchical agglomerative clustering
✍ Scribed by G. Craig Murray; Jimmy Lin; Abdur Chowdhury
- Publisher
- Wiley (John Wiley & Sons)
- Year
- 2007
- Tongue
- English
- Weight
- 83 KB
- Volume
- 43
- Category
- Article
- ISSN
- 0044-7870
No coin nor oath required. For personal study only.
✦ Synopsis
Abstract
We introduce a novel approach to identifying Web search user sessions based on the burstiness of users' activity. Our method is user‐centered rather than population‐centered or system‐centered and can be deployed in situations in which users choose to withhold personal content information. We adopt a hierarchical agglomerative clustering approach with a stopping criterion that is statistically motivated by users' activities. An evaluation based on extracts from AOL Search™ logs reveals that our algorithm achieves 98% accuracy in identifying session boundaries compared to human judgments.
📜 SIMILAR VOLUMES
Nature ofphysical problem In many computer programs simulating many-body problems, Catalogue number: ACKZ patterns topologically connected have to be determined. This problem arises especially in Monte Carlo and molecular dy-Program obtainable from: CPC Program Library, Queen's Uth-namics calculatio