๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

[ACM Press Proceeding of the 33rd international ACM SIGIR conference - Geneva, Switzerland (2010.07.19-2010.07.23)] Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '10 - Mining the blogosphere for top news stories identification

โœ Scribed by Lee, Yeha; Jung, Hun-young; Song, Woosang; Lee, Jong-Hyeok


Book ID
115534474
Publisher
ACM Press
Year
2010
Tongue
English
Weight
395 KB
Volume
0
Category
Article
ISBN
1450301533

No coin nor oath required. For personal study only.

โœฆ Synopsis


The analysis of query logs from blog search engines show that news-related queries occupy a significant portion of the logs. This raises a interesting research question on whether the blogosphere can be used to identify important news stories. In this paper, we present novel approaches to identify important news story headlines from the blogosphere for a given day. The proposed system consists of two components based on the language model framework, the query likelihood and the news headline prior. For the query likelihood, we propose several approaches to estimate the query language model and the news headline language model. We also suggest several criteria to evaluate the news headline prior that is the prior belief about the importance or newsworthiness of the news headline for a given day. Experimental results show that our system significantly outperforms a baseline system. Specifically, the proposed approach gives 2.62% and 10.19% further increases in MAP and P@5 over the best performing result of the TREC'09 Top Stories Identification Task.


๐Ÿ“œ SIMILAR VOLUMES


[ACM Press Proceeding of the 33rd intern
โœ Dai, Na; Davison, Brian D. ๐Ÿ“‚ Article ๐Ÿ“… 2010 ๐Ÿ› ACM Press ๐ŸŒ English โš– 554 KB

The collective contributions of billions of users across the globe each day result in an ever-changing web. In verticals like news and real-time search, recency is an obvious significant factor for ranking. However, traditional link-based web ranking algorithms typically run on a single web snapshot