✦ LIBER ✦

Latent Semantic Indexing: A Probabilistic Analysis

✍ Scribed by Christos H. Papadimitriou; Prabhakar Raghavan; Hisao Tamaki; Santosh Vempala

Publisher: Elsevier Science
Year: 2000
Tongue: English
Weight: 182 KB
Volume: 61
Category: Article
ISSN: 0022-0000
DOI: 10.1006/jcss.2000.1711

No coin nor oath required. For personal study only.

✦ Synopsis

Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the term-document matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance. We propose the technique of random projection as a way of speeding up LSI. We complement our theorems with encouraging experimental results. We also argue that our results may be viewed in a more general framework, as a theoretical basis for the use of spectral methods in a wider class of applications such as collaborative filtering.

📜 SIMILAR VOLUMES

A probabilistic model for Latent Semanti

A probabilistic model for Latent Semantic Indexing

✍ Chris H.Q. Ding 📂 Article 📅 2005 🏛 John Wiley and Sons 🌐 English ⚖ 183 KB 👁 1 views

## Abstract Latent Semantic Indexing (LSI), when applied to semantic space built on text collections, improves information retrieval, information filtering, and word sense disambiguation. A new dual probability model based on the similarity concepts is introduced to provide deeper understanding of

Using latent semantic indexing for liter

Using latent semantic indexing for literature based discovery

✍ Gordon, Michael D. ;Dumais, Susan 📂 Article 📅 1998 🏛 John Wiley and Sons 🌐 English ⚖ 146 KB 👁 1 views

## Latent semantic indexing (LSI) is a statistical technique As described by Swanson, there are two basic literature for improving information retrieval effectiveness. Here, discovery processes. The first leads from the literature we use LSI to assist in literature-based discoveries. The (R) associ

Eigenvalue-based model selection during

Eigenvalue-based model selection during latent semantic indexing

✍ Miles Efron 📂 Article 📅 2005 🏛 John Wiley and Sons 🌐 English ⚖ 252 KB 👁 1 views

## Abstract In this study amended parallel analysis (APA), a novel method for model selection in unsupervised learning problems such as information retrieval (IR), is described. At issue is the selection of __k__, the number of dimensions retained under latent semantic indexing (LSI). Amended paral

Automatic text summarization based on la

Automatic text summarization based on latent semantic indexing

✍ Dongmei Ai; Yuchao Zheng; Dezheng Zhang 📂 Article 📅 2010 🏛 Springer Japan 🌐 English ⚖ 145 KB

Determining the context of text using au

Determining the context of text using augmented latent semantic indexing

✍ Tom Rishel; Louise A. Perkins; Sumanth Yenduri; Farnaz Zand 📂 Article 📅 2007 🏛 John Wiley and Sons 🌐 English ⚖ 232 KB 👁 1 views

## Abstract Latent semantic analysis has been used for several years to improve the performance of document library searches. We show that latent semantic analysis, augmented with a Part–of–Speech Tagger, may be an effective algorithm for classifying a textual document as well. Using Brille's Part–

Genetic algorithm for text clustering ba

Genetic algorithm for text clustering based on latent semantic indexing

✍ Wei Song; Soon Cheol Park 📂 Article 📅 2009 🏛 Elsevier Science 🌐 English ⚖ 630 KB

## a b s t r a c t In this paper, we develop a genetic algorithm method based on a latent semantic model (GAL) for text clustering. The main difficulty in the application of genetic algorithms (GAs) for document clustering is thousands or even tens of thousands of dimensions in feature space which