Textual information on the Web is very huge, varied, and useful. Although traditional text mining treats a text document as a single piece of information, this approach may not be suitable for Web documents that are long and heterogeneous in their contents. This article presents a new approach that
Text Mining in different languages
โ Scribed by Lebart, Ludovic
- Publisher
- John Wiley and Sons
- Year
- 1998
- Tongue
- English
- Weight
- 147 KB
- Volume
- 14
- Category
- Article
- ISSN
- 8755-0024
No coin nor oath required. For personal study only.
โฆ Synopsis
The purpose of Text Mining is to describe and explore textual data, to uncover structural traits, and proceed to predictions. The "eld of application concerns Information Retrieval, processing responses to open-ended questions in sample surveys as well as processing textual corpora of a more general nature. At the intersection of Corpora Linguistics and Exploratory Statistical Analysis, a series of language independent tools and methods can perform most of the previously mentioned tasks, including the assessment and validation of the obtained results, be it visualization or categorization. Multiple confusion matrices calculated on test-samples characterize the quality of the prediction as well as the structure of errors of prediction. In the case of multinational surveys and corpora, they allow us to proceed to comparisons among several countries, in spite of the very heterogeneous character of the basic information (texts in di!erent languages).
๐ SIMILAR VOLUMES
Design can be characterized using a linguistic model which compares the use and power of language in real-life with its use and power in text-based virtual worlds. In this paper, the theory of speech acts is used as a background and a point of development to analyse and model design in the virtual s
## Abstract Based on the salient features of the documents, automatic text summarization systems extract the key sentences from source documents. This process supports the users in evaluating the relevance of the extracted documents returned by information retrieval systems. Because of this tool, e