✦ LIBER ✦

Clustering and classification of large document bases in a parallel environment

✍ Scribed by Ruocco, Anthony S. ;Frieder, Ophir

Publisher: John Wiley and Sons
Year: 1997
Tongue: English
Weight: 189 KB
Volume: 48
Category: Article
ISSN: 0002-8231
DOI: 10.1002/(sici)1097-4571(199710)48:10<932::aid-asi7>3.0.co;2-2

No coin nor oath required. For personal study only.

✦ Synopsis

Development of cluster-based search systems has been ing, information retrieval systems must be prepared to hampered by prohibitive times involved in clustering process large amounts of data. As systems become inunlarge document sets. Once completed, maintaining clusdated with more and more information, it may not be ter organizations is difficult in dynamic file environpossible for people to fully understand what they have ments. We propose the use of parallel computing syscollected. The ability to produce information that categotems to overcome the computationally intense clustering process. Two operations are examined. The first is rizes the data, that is, the ability to produce metadata, is clustering a document set and the second is classifying in many cases as important as identifying specific pieces the document set. A subset of the TIPSTER corpus, speof data within a document set. cifically, articles from the Wall Street Journal, is used.

The ever-increasing size, coupled with the increasing

Document set classification was performed without the large storage requirement (potentially as high as 522M)

requirements to classify, group, and process the document for ancillary data matrices. In all cases, the time perforsets, all within nonprohibitive execution times, motivates mance of the parallel system was an improvement over the use of parallel processing computers. Parallel informasequential system times, and produced the same clustion retrieval focuses on this particular domain (Pogue, tering and classification scheme. Some results show 1988; Rasmussen, 1991;. Query pronear linear speed up in higher threshold clustering applications.

cessing assumes an organized data set as input. We, however, rely on parallel computing to organize the data by performing two cluster preprocessing operations. The first *Ophir Frieder is currently on leave from the Department of Computer Science, George Mason

📜 SIMILAR VOLUMES

Development and validation of a cluster-

Development and validation of a cluster-based classification system to facilitate treatment tailoring

✍ Susan E. Collins; Iris Torchalla; Martina Schröter; Gerhard Buchkremer; Anil Bat 📂 Article 📅 2008 🏛 John Wiley and Sons 🌐 English ⚖ 126 KB

## Aims: The objectives of this study were to replicate smoker profi les identifi ed in Batra et al. (in press) and to develop a cluster-based classifi cation system to categorize new cases into smoker profi les so that an appropriate tailored intervention could be applied. Methods: Participants w

Using Java and JavaScript in the Virtual

Using Java and JavaScript in the Virtual Programming Laboratory: a Web-based parallel programming environment

✍ Dincer, Kivanc; Fox, Geoffrey C. 📂 Article 📅 1997 🏛 John Wiley and Sons 🌐 English ⚖ 277 KB 👁 2 views

The Virtual Programming Laboratory (VPL) is a Web-based virtual programming environment built based on a client-server architecture. The system can be accessed on any platform (Unix, PC or Mac) using a standard Java-enabled browser. Software delivery over the Web imposes a novel set of constraints o

Clustering and Rule-Based Classification

Clustering and Rule-Based Classifications of Chemical Structures Evaluated in the Biological Activity Space.

✍ Ansgar Schuffenhauer; Nathan Brown; Peter Ertl; Jeremy L. Jenkins; Paul Selzer; 📂 Article 📅 2007 🏛 John Wiley and Sons ⚖ 11 KB 👁 1 views

An incremental elastic-plastic Finite El

An incremental elastic-plastic Finite Element solver in a workstation cluster environment Part I. Formulations and parallel processing

✍ Anna Feriani; Alberto Franchi; Francesco Genna 📂 Article 📅 1996 🏛 Elsevier Science 🌐 English ⚖ 909 KB

A comparison of spindle concentrations i

A comparison of spindle concentrations in large and small muscles acting in parallel combinations

✍ D. Peck; D. F. Buxton; A. Nitz 📂 Article 📅 1984 🏛 John Wiley and Sons 🌐 English ⚖ 607 KB

A small short muscle frequently acts across a joint in parallel with a vastly larger and longer muscle; therefore it should play a minimal role in the mechanical control of that joint. This study provides evidence suggesting that the small member of such a "parallel muscle combination" (PMC) may ser

A comparative study of Java and C perfor

A comparative study of Java and C performance in two large-scale parallel applications

✍ Aamir Shafi; Bryan Carpenter; Mark Baker; Aftab Hussain 📂 Article 📅 2009 🏛 John Wiley and Sons 🌐 English ⚖ 415 KB