𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Linear time series models for term weighting in information retrieval

✍ Scribed by Miles Efron


Publisher
John Wiley and Sons
Year
2010
Tongue
English
Weight
645 KB
Volume
61
Category
Article
ISSN
1532-2882

No coin nor oath required. For personal study only.

✦ Synopsis


Abstract

Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. However, realistic scenarios yield additional information about terms in a collection. Of interest in this article is the temporal behavior of terms as a collection changes over time. We propose capturing each term's collection frequency at discrete time intervals over the lifespan of a corpus and analyzing the resulting time series. We hypothesize the collection frequency of a weakly discriminative term x at time t is predictable by a linear model of the term's prior observations. On the other hand, a linear time series model for a strong discriminators' collection frequency will yield a poor fit to the data. Operationalizing this hypothesis, we induce three time‐based measures of term importance and test these against state‐of‐the‐art term weighting models.


📜 SIMILAR VOLUMES