✦ LIBER ✦

Automating extraction of logical domains in a web site

✍ Scribed by Necip Fazıl Ayan; Wen-Syan Li; Okan Kolak

Publisher: Elsevier Science
Year: 2002
Tongue: English
Weight: 535 KB
Volume: 43
Category: Article
ISSN: 0169-023X
DOI: 10.1016/s0169-023x(02)00055-1

No coin nor oath required. For personal study only.

✦ Synopsis

The domain name field in a universal resource locator (URL) has been viewed as a natural choice to organize Web pages. For example, Web search results may be grouped in terms of domains and presented to users as clusters for ease of visualization. However, using this approach, large Web sites, such as Geocities, W3C, and www.cs.umd.edu, tend to yield many matches that leads to a few large, flat structured, and unorganized clusters. As a matter of fact, many pages in these sites are actually ''logical domains'' by themselves. For example, Web sites for projects at a university or the XML section at W3C could be viewed as ''logical domains''. In this paper, we propose the concept of a logical domain, which is identified by semantic relatedness, as opposed to a physical domain, which is identified simply by domain name. The identification of logical domain is important to many Web applications, such as query result reorganization, site map generation, and topic distillation. We have developed and implemented a set of rules based on link structure, path information, document metadata, and citations to identify logical domain entry pages (i.e., root pages of logical domains). The importance of these rules are automatically adjusted using a novel decision tree algorithm and training data provided by human feedback. We also develop techniques to define the boundary of each logical domain based on identified logical domain entry pages. We have conducted extensive experiments on real Web sites to evaluate the effectiveness of our proposed techniques. The experimental results show that our techniques perform very well in extracting logical domains in a Web site.

📜 SIMILAR VOLUMES

Extracting link chains of relationship i

Extracting link chains of relationship instances from a Web site

✍ Myo-Myo Naing; Ee-Peng Lim; Roger H.L. Chiang 📂 Article 📅 2006 🏛 John Wiley and Sons 🌐 English ⚖ 533 KB

## Abstract Web pages from a Web site can often be associated with concepts in an ontology, and pairs of Web pages also can be associated with relationships between concepts. With such associations, the Web site can be searched, browsed, or even reorganized based on the concept and relationship lab

An empirical study of automated dictiona

An empirical study of automated dictionary construction for information extraction in three domains

✍ E. Riloff 📂 Article 📅 1996 🏛 Elsevier Science 🌐 English ⚖ 94 KB

A compendium of world wide web sites rel

A compendium of world wide web sites related to mercury in fish

✍ Venkataraman, Kalyanakrishnan ;Kumar, Ashok 📂 Article 📅 2000 🏛 American Institute of Chemical Engineers 🌐 English ⚖ 542 KB

A Complete Set of Axioms for Logical For

A Complete Set of Axioms for Logical Formulas Invalid in Some Finite Domain

✍ Theodore Hailperin 📂 Article 📅 1961 🏛 John Wiley and Sons 🌐 English ⚖ 730 KB

Fully automated continuous crosscurrent

Fully automated continuous crosscurrent extraction of enzymes in a two-stage plant

✍ H. Hustedt; B. Börner; K. H. Kroner; N. Papamichael 📂 Article 📅 1987 🏛 Springer-Verlag 🌐 English ⚖ 414 KB

Extractive purification of proteins, using fumarase from Saccharomyces cerevisiae as a model system, is demonstrated to allow automatic and continuous processing desirable for industrial production purposes.

Automated measurements in the time domai

Automated measurements in the time domain and a method of increasing their accuracy

✍ A. V. Andriyanov; G. V. Glebovich; V. V. Krylov; S. Ya. Korsakov; D. M. Ponomare 📂 Article 📅 1980 🏛 Springer US 🌐 English ⚖ 430 KB