𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Automated Taxonomy Discovery and Exploration

✍ Scribed by Jiaming Shen, Jiawei Han


Publisher
Springer
Year
2022
Tongue
English
Leaves
112
Series
Synthesis Lectures on Data Mining and Knowledge Discovery
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


This book provides a principled data-driven framework that progressively constructs, enriches, and applies taxonomies without leveraging massive human annotated data. Traditionally, people construct domain-specific taxonomies by extensive manual curations, which is time-consuming and costly. In today’s information era, people are inundated with the vast amounts of text data. Despite their usefulness, people haven’t yet exploited the full power of taxonomies due to the heavy curation needed for creating and maintaining them. To bridge this gap, the authors discuss automated taxonomy discovery and exploration, with an emphasis on label-efficient machine learning methods and their real-world usages. Taxonomy organizes entities and concepts in a hierarchy way. It is ubiquitous in our daily life, ranging from product taxonomies used by online retailers, topic taxonomies deployed by news outlets and social media, as well as scientific taxonomies deployed by digital libraries across various domains. When properly analyzed, these taxonomies can play a vital role for science, engineering, business intelligence, policy design, e-commerce, and more. Intuitive examples are used throughout enabling readers to grasp concepts more easily.

✦ Table of Contents


Preface
Contents
1 Introduction
1.1 Overview
1.2 Technical Roadmap
1.2.1 Concept Set Expansion
1.2.2 Taxonomy Construction
1.2.3 Taxonomy Enrichment
1.2.4 Taxonomy-Guided Classification
1.3 Organization
2 Concept Set Expansion
2.1 Overview and Motivations
2.2 Related Work
2.3 SetExpan: Weakly-Supervised Concept Set Expansion
2.3.1 Data Model and Context Features
2.3.2 Context-Dependent Concept Similarity
2.3.3 Context Feature Selection
2.3.4 Concept Selection via Rank Ensemble
2.4 Experiments
2.4.1 Datasets
2.4.2 Compared Methods
2.4.3 Evaluation Metrics
2.4.4 Overall Performance
2.4.5 Ablation Studies
2.4.6 Case Studies
2.5 Extensions of SetExpan
2.5.1 Addressing Concept Drifts via Auxiliary Sets Generation and Co-expansion
2.5.2 Probing Knowledge from Pre-trained Language Models
2.6 Summary
3 Taxonomy Construction
3.1 Overview and Motivations
3.2 Related Work
3.3 HiExpan: Task-Guided Concept Taxonomy Construction
3.3.1 Problem Formulation
3.3.2 Framework Overview
3.3.3 Key Term Extraction
3.3.4 Iterative Width and Depth Expansion
3.3.5 Taxonomy Global Optimization
3.4 Experiments
3.4.1 Datasets
3.4.2 Compared Methods
3.4.3 Evaluation Metrics
3.4.4 Quantitative Results
3.4.5 Case Studies
3.5 Summary
4 Taxonomy Enrichment
4.1 Overview and Motivations
4.2 Related Work
4.3 TaxoExpan: Self-supervised Taxonomy Expansion
4.3.1 Problem Formulation
4.3.2 Taxonomy Modeling and Expansion Goal
4.3.3 Query-Anchor Matching Model
4.3.4 Model Learning and Inference
4.4 Experiments
4.4.1 Experiments on MAG Dataset
4.4.2 Experiments on SemEval Dataset
4.5 Extensions of TaxoExpan
4.5.1 Incorporating More Fine-Grained Self-supervision Tasks
4.5.2 Identifying Potential Children Concepts
4.5.3 Modeling Relations Among News Concepts
4.6 Summary
5 Taxonomy-Guided Classification
5.1 Overview and Motivations
5.2 Related Work
5.3 TaxoClass: Weakly-Supervised Hierarchical Multi-label Text Classification
5.3.1 Problem Formulation
5.3.2 Document-Class Similarity Calculation
5.3.3 Document Core Class Mining
5.3.4 Core Class Guided Classifier Training
5.3.5 Multi-label Self-training
5.4 Experiments
5.4.1 Datasets
5.4.2 Compared Methods
5.4.3 Evaluation Metrics
5.4.4 Implementation Details
5.4.5 Overall Performance Comparison
5.4.6 Effectiveness of Core Class Mining
5.4.7 Analysis of Classifier Architecture
5.4.8 Supervision Signals in Class Names
5.5 Summary
6 Conclusions
6.1 Summary
6.2 Future Work
6.2.1 Integrate Heterogeneous Modalities and Sources
6.2.2 Engage with Human Behaviors and Interactions
6.2.3 Preserve Data Privacy and Model Security


πŸ“œ SIMILAR VOLUMES


Explorations in Automatic Thesaurus Disc
✍ Gregory Grefenstette (auth.) πŸ“‚ Library πŸ“… 1994 πŸ› Springer US 🌐 English

<p><em>Explorations in Automatic Thesaurus Discovery</em> presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term simil

Historical Dictionary of the Discovery a
✍ Alan Day πŸ“‚ Library πŸ“… 2003 πŸ› Scarecrow Press 🌐 English

This engaging Dictionary examines the history of, the search for, and the discovery of Australia, taking full account of the evidence for and the speculation surrounding possible earlier contacts by the Ancient Egyptians, Arabs, and Chinese seamen. Day brings the expeditions to life, expressing the