✦ LIBER ✦

Full-text indexing of non-textual resources

✍ Scribed by David Byers

Book ID: 104309969
Publisher: Elsevier Science
Year: 1998
Tongue: English
Weight: 703 KB
Volume: 30
Category: Article
ISSN: 0169-7552
DOI: 10.1016/s0169-7552(98)00059-2

No coin nor oath required. For personal study only.

✦ Synopsis

Full-text indexing of resources on the World Wide Web is limited to simple content types, such as HTML and plain text. More complex content types, such as Postscript, PDF and proprietary word-processing formats are excluded, despite the fact that such documents are usually rich in content. The reason for excluding these types of resources is simply that it would be too expensive and too difficult to attempt to extract a textual representation from them. The operator of a search engine is simply not motivated to expend the additional resources that would be needed to handle such documents. The gain would be fairly small, and search engines are extremely popular even when they are limited to HTML and plain text documents.

The situation is quite different from the point-of-view of the content provider. A site may have significant amounts of its content in non-textual documents, but despite this the content provider may want to have the documents indexed in normal search engines.

In this paper we present several server-side solutions that allow existing indexing software to index the textual representation of non-textual resources.

📜 SIMILAR VOLUMES

Less than full-text indexing using a non

Less than full-text indexing using a non-boolean searching model

✍ Cleveland, Donald B. ;Cleveland, Ana D. ;Wise, Olga B. 📂 Article 📅 1984 🏛 John Wiley and Sons 🌐 English ⚖ 780 KB

Automatic indexing of full texts

✍ Zdeněk Jonák 📂 Article 📅 1984 🏛 Elsevier Science 🌐 English ⚖ 576 KB

The effect of subject matter on the auto

The effect of subject matter on the automatic indexing of full text

✍ Rowbottom, Mary E. ;Willett, Peter 📂 Article 📅 1982 🏛 John Wiley and Sons 🌐 English ⚖ 264 KB

CrossRef pilots plagiarism detection ser

CrossRef pilots plagiarism detection service - six academic publishers to allow full-text indexing

📂 Article 📅 2007 🏛 Institute of Electrical and Electronics Engineers ⚖ 101 KB

Hierarchical concept indexing of full-te

Hierarchical concept indexing of full-text documents in the Unified Medical Language System� Information Sources Map

✍ Wright, Lawrence W. ;Nardini, Holly K. Grossetta ;Aronson, Alan R. ;Rindflesch, 📂 Article 📅 1999 🏛 John Wiley and Sons 🌐 English ⚖ 73 KB 👁 2 views

Full-text documents are a vital and rapidly growing part of online biomedical information. A single large document can contain as much information as a small database, but normally lacks the tight structure and consistent indexing of a database. Retrieval systems will often miss highly relevant part

Padok-II: Retrieval test for the evaluat

Padok-II: Retrieval test for the evaluation of full text indexing variants of the German patent information system

📂 Article 📅 1990 🏛 Elsevier Science 🌐 English ⚖ 160 KB