✦ LIBER ✦

A probabilistic model for Latent Semantic Indexing

✍ Scribed by Chris H.Q. Ding

Publisher: John Wiley and Sons
Year: 2005
Tongue: English
Weight: 183 KB
Volume: 56
Category: Article
ISSN: 1532-2882
DOI: 10.1002/asi.20148

No coin nor oath required. For personal study only.

✦ Synopsis

Abstract

Latent Semantic Indexing (LSI), when applied to semantic space built on text collections, improves information retrieval, information filtering, and word sense disambiguation. A new dual probability model based on the similarity concepts is introduced to provide deeper understanding of LSI. Semantic associations can be quantitatively characterized by their statistical significance, the likelihood. Semantic dimensions containing redundant and noisy information can be separated out and should be ignored because their negative contribution to the overall statistical significance. LSI is the optimal solution of the model. The peak in the likelihood curve indicates the existence of an intrinsic semantic dimension. The importance of LSI dimensions follows the Zipf‐distribution, indicating that LSI dimensions represent latent concepts. Document frequency of words follows the Zipf distribution, and the number of distinct words follows log‐normal distribution. Experiments on five standard document collections confirm and illustrate the analysis.

📜 SIMILAR VOLUMES

Eigenvalue-based model selection during

Eigenvalue-based model selection during latent semantic indexing

✍ Miles Efron 📂 Article 📅 2005 🏛 John Wiley and Sons 🌐 English ⚖ 252 KB

## Abstract In this study amended parallel analysis (APA), a novel method for model selection in unsupervised learning problems such as information retrieval (IR), is described. At issue is the selection of __k__, the number of dimensions retained under latent semantic indexing (LSI). Amended paral

Using latent semantic indexing for liter

Using latent semantic indexing for literature based discovery

✍ Gordon, Michael D. ;Dumais, Susan 📂 Article 📅 1998 🏛 John Wiley and Sons 🌐 English ⚖ 146 KB

## Latent semantic indexing (LSI) is a statistical technique As described by Swanson, there are two basic literature for improving information retrieval effectiveness. Here, discovery processes. The first leads from the literature we use LSI to assist in literature-based discoveries. The (R) associ

Probabilistic passage models for semanti

Probabilistic passage models for semantic search of genomics literature

✍ Jay Urbain; Nazli Goharian; Ophir Frieder 📂 Article 📅 2008 🏛 John Wiley and Sons 🌐 English ⚖ 215 KB

Determining the context of text using au

Determining the context of text using augmented latent semantic indexing

✍ Tom Rishel; Louise A. Perkins; Sumanth Yenduri; Farnaz Zand 📂 Article 📅 2007 🏛 John Wiley and Sons 🌐 English ⚖ 232 KB

## Abstract Latent semantic analysis has been used for several years to improve the performance of document library searches. We show that latent semantic analysis, augmented with a Part–of–Speech Tagger, may be an effective algorithm for classifying a textual document as well. Using Brille's Part–

Regarding “A Probabilistic Model for the

Regarding “A Probabilistic Model for the Distribution of Authorships”

✍ Davis, Charles H. 📂 Article 📅 1992 🏛 John Wiley and Sons 🌐 English ⚖ 370 KB 👁 1 views

A unified probabilistic approach for mod

A unified probabilistic approach for modeling trajectory-based separations

✍ Jing Wei; Matthew J. Realff 📂 Article 📅 2005 🏛 American Institute of Chemical Engineers 🌐 English ⚖ 199 KB 👁 1 views

## Abstract The traditional approach to modeling solid–solid separations is based on solving differential equations for particle concentration profiles. This approach is difficult to generalize when the particle properties, such as size and charge, are random variables. If we view particle trajecto