Linear time series models for term weighting in information retrieval
✍ Scribed by Miles Efron
- Publisher
- John Wiley and Sons
- Year
- 2010
- Tongue
- English
- Weight
- 645 KB
- Volume
- 61
- Category
- Article
- ISSN
- 1532-2882
No coin nor oath required. For personal study only.
✦ Synopsis
Abstract
Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. However, realistic scenarios yield additional information about terms in a collection. Of interest in this article is the temporal behavior of terms as a collection changes over time. We propose capturing each term's collection frequency at discrete time intervals over the lifespan of a corpus and analyzing the resulting time series. We hypothesize the collection frequency of a weakly discriminative term x at time t is predictable by a linear model of the term's prior observations. On the other hand, a linear time series model for a strong discriminators' collection frequency will yield a poor fit to the data. Operationalizing this hypothesis, we induce three time‐based measures of term importance and test these against state‐of‐the‐art term weighting models.
📜 SIMILAR VOLUMES