๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

Eigenvalue-based model selection during latent semantic indexing

โœ Scribed by Miles Efron


Publisher
John Wiley and Sons
Year
2005
Tongue
English
Weight
252 KB
Volume
56
Category
Article
ISSN
1532-2882

No coin nor oath required. For personal study only.

โœฆ Synopsis


Abstract

In this study amended parallel analysis (APA), a novel method for model selection in unsupervised learning problems such as information retrieval (IR), is described. At issue is the selection of k, the number of dimensions retained under latent semantic indexing (LSI). Amended parallel analysis is an elaboration of Horn's parallel analysis, which advocates retaining eigenvalues larger than those that we would expect under term independence. Amended parallel analysis operates by deriving confidence intervals on these โ€œnullโ€ eigenvalues. The technique amounts to a series of nonparametric hypothesis tests on the correlation matrix eigenvalues. In the study, APA is tested along with four established dimensionality estimators on six standard IR test collections. These estimates are evaluated with regard to two IR performance metrics. Additionally, results from simulated data are reported. In both rounds of experimentation APA performs well, predicting the best values of k on 3 of 12 observations, with good predictions on several others, and never offering the worst estimate of optimal dimensionality.


๐Ÿ“œ SIMILAR VOLUMES


A probabilistic model for Latent Semanti
โœ Chris H.Q. Ding ๐Ÿ“‚ Article ๐Ÿ“… 2005 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 183 KB ๐Ÿ‘ 1 views

## Abstract Latent Semantic Indexing (LSI), when applied to semantic space built on text collections, improves information retrieval, information filtering, and word sense disambiguation. A new dual probability model based on the similarity concepts is introduced to provide deeper understanding of

Using latent semantic indexing for liter
โœ Gordon, Michael D. ;Dumais, Susan ๐Ÿ“‚ Article ๐Ÿ“… 1998 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 146 KB ๐Ÿ‘ 1 views

## Latent semantic indexing (LSI) is a statistical technique As described by Swanson, there are two basic literature for improving information retrieval effectiveness. Here, discovery processes. The first leads from the literature we use LSI to assist in literature-based discoveries. The (R) associ

Word sense disambiguation by selecting t
โœ Susanne M. Humphrey; Willie J. Rogers; Halil Kilicoglu; Dina Demner-Fushman; Tho ๐Ÿ“‚ Article ๐Ÿ“… 2005 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 376 KB

## Abstract An experiment was performed at the National Library of Medicine^ยฎ^ (NLM^ยฎ^) in word sense disambiguation (WSD) using the Journal Descriptor Indexing (JDI) methodology. The motivation is the need to solve the ambiguity problem confronting NLM's MetaMap system, which maps free text to ter

S.M. Humphrey, W.J. Rogers, H. Kilicoglu
๐Ÿ“‚ Article ๐Ÿ“… 2006 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 20 KB

## Abstract The original article to which this Erratum refers was published in Journal of the American Society for Information Science and Technology 57(1) 2006, 96โ€“113.