๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

An alternative, layout-driven approach to the clustering of documents

โœ Scribed by Vincenzo Loia; Sabrina Senatore


Publisher
John Wiley and Sons
Year
2008
Tongue
English
Weight
539 KB
Volume
23
Category
Article
ISSN
0884-8173

No coin nor oath required. For personal study only.

โœฆ Synopsis


Internet has become a huge repository of information and knowledge, based on the sharing of the electronic documents. Last trends in knowledge management focus on the knowledge representation based on the document content. In fact, most accustomed approaches achieve the document understanding by analyzing the "portions of information" in the document which describe the content, through techniques of text parsing and extraction. This paper presents an alternative approach that departs from the consolidated techniques of document management and focuses on the logical structure of a PDF document as a discriminating source of document knowledge. The main idea is based on the fact, when the reader looks at a paper, his first perception is related to the layout of the document. The analysis of layout, typesetting, paginating, and graphical arrangement of a document provides interesting information about its content understanding; in general, the documents that are in the same category present similar page layout, fonts, and figures arrangement. In this sense, this work presents an alternative way to deal with documents recognition and understanding, through the analysis of the layout of electronic PDF documents and their classification.


๐Ÿ“œ SIMILAR VOLUMES


An alternative approach to the delivery
โœ Lonny W. Morrow ๐Ÿ“‚ Article ๐Ÿ“… 1975 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 329 KB ๐Ÿ‘ 2 views

School psychologists are frequently criticized for not effecting change in students referred to them. This article describes one model-process which the author, while functioning as a school psychologist, successfully implemented to eliminate this criticism. Adoption of this approach should result i