𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Automating extraction of logical domains in a web site

✍ Scribed by Necip Fazıl Ayan; Wen-Syan Li; Okan Kolak


Publisher
Elsevier Science
Year
2002
Tongue
English
Weight
535 KB
Volume
43
Category
Article
ISSN
0169-023X

No coin nor oath required. For personal study only.

✦ Synopsis


The domain name field in a universal resource locator (URL) has been viewed as a natural choice to organize Web pages. For example, Web search results may be grouped in terms of domains and presented to users as clusters for ease of visualization. However, using this approach, large Web sites, such as Geocities, W3C, and www.cs.umd.edu, tend to yield many matches that leads to a few large, flat structured, and unorganized clusters. As a matter of fact, many pages in these sites are actually ''logical domains'' by themselves. For example, Web sites for projects at a university or the XML section at W3C could be viewed as ''logical domains''. In this paper, we propose the concept of a logical domain, which is identified by semantic relatedness, as opposed to a physical domain, which is identified simply by domain name. The identification of logical domain is important to many Web applications, such as query result reorganization, site map generation, and topic distillation. We have developed and implemented a set of rules based on link structure, path information, document metadata, and citations to identify logical domain entry pages (i.e., root pages of logical domains). The importance of these rules are automatically adjusted using a novel decision tree algorithm and training data provided by human feedback. We also develop techniques to define the boundary of each logical domain based on identified logical domain entry pages. We have conducted extensive experiments on real Web sites to evaluate the effectiveness of our proposed techniques. The experimental results show that our techniques perform very well in extracting logical domains in a Web site.


📜 SIMILAR VOLUMES


Extracting link chains of relationship i
✍ Myo-Myo Naing; Ee-Peng Lim; Roger H.L. Chiang 📂 Article 📅 2006 🏛 John Wiley and Sons 🌐 English ⚖ 533 KB

## Abstract Web pages from a Web site can often be associated with concepts in an ontology, and pairs of Web pages also can be associated with relationships between concepts. With such associations, the Web site can be searched, browsed, or even reorganized based on the concept and relationship lab

Fully automated continuous crosscurrent
✍ H. Hustedt; B. Börner; K. H. Kroner; N. Papamichael 📂 Article 📅 1987 🏛 Springer-Verlag 🌐 English ⚖ 414 KB

Extractive purification of proteins, using fumarase from Saccharomyces cerevisiae as a model system, is demonstrated to allow automatic and continuous processing desirable for industrial production purposes.