## Abstract Latent Semantic Indexing (LSI), when applied to semantic space built on text collections, improves information retrieval, information filtering, and word sense disambiguation. A new dual probability model based on the similarity concepts is introduced to provide deeper understanding of
Eigenvalue-based model selection during latent semantic indexing
โ Scribed by Miles Efron
- Publisher
- John Wiley and Sons
- Year
- 2005
- Tongue
- English
- Weight
- 252 KB
- Volume
- 56
- Category
- Article
- ISSN
- 1532-2882
No coin nor oath required. For personal study only.
โฆ Synopsis
Abstract
In this study amended parallel analysis (APA), a novel method for model selection in unsupervised learning problems such as information retrieval (IR), is described. At issue is the selection of k, the number of dimensions retained under latent semantic indexing (LSI). Amended parallel analysis is an elaboration of Horn's parallel analysis, which advocates retaining eigenvalues larger than those that we would expect under term independence. Amended parallel analysis operates by deriving confidence intervals on these โnullโ eigenvalues. The technique amounts to a series of nonparametric hypothesis tests on the correlation matrix eigenvalues. In the study, APA is tested along with four established dimensionality estimators on six standard IR test collections. These estimates are evaluated with regard to two IR performance metrics. Additionally, results from simulated data are reported. In both rounds of experimentation APA performs well, predicting the best values of k on 3 of 12 observations, with good predictions on several others, and never offering the worst estimate of optimal dimensionality.
๐ SIMILAR VOLUMES
## Latent semantic indexing (LSI) is a statistical technique As described by Swanson, there are two basic literature for improving information retrieval effectiveness. Here, discovery processes. The first leads from the literature we use LSI to assist in literature-based discoveries. The (R) associ
## Abstract An experiment was performed at the National Library of Medicine^ยฎ^ (NLM^ยฎ^) in word sense disambiguation (WSD) using the Journal Descriptor Indexing (JDI) methodology. The motivation is the need to solve the ambiguity problem confronting NLM's MetaMap system, which maps free text to ter
## Abstract The original article to which this Erratum refers was published in Journal of the American Society for Information Science and Technology 57(1) 2006, 96โ113.