Theory and Algorithms for Information Extraction and Classification in Textual Data Mining

✍ Scribed by Wu T.

Year: 2003
Tongue: English
Leaves: 5
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Regular expressions can be used as patterns to extract features from semi-structured and narrative text [8]. For example, in police reports a suspect's height might be recorded as "{CD} feet {CD} inches tall", where {CD} is the part of speech tag for a numeric value. The result in [1] shows us that regular expressions could have higher performance than explicit expressions in some applications such as Posting Act Tagging. Although much work has been done in the field of information extraction, relatively little has focused on the automatic discovery of regular expressions. Therefore, my Ph.D. research will focus on the automatic generation of reduced regular expressions (RREs) (defined in [8]) used in Information Extraction (IE).The reduced regular expressions learned can be directly used to extract features from free text, or they can be used to fill in templates in Eric Brill's Transformation-Based Learning (TBL) [2] frameworks. The original templates in TBL are explicit expressions, which are weaker than reduced regular expressions. I propose an innovative enhancement to TBL termed "Error-Driven Boolean-Logic-Rule-Based Learning" (BLogRBL) [9], which is strictly more powerful than TBL [2]. Similar to Brill's method, rules are automatically derived from templates during learning. It differs from Brill's technique in that rules take the form of complex expressions of combinational logic. Therefore, my final contribution in my PhD thesis will be a framework that combines regular expression discovery with BLogRBL.A necessary component of this research is a study of various biases inherent in the use of reduced regular expressions in IE. The purpose of this work is to determine the language biases, search biases, and overfitting biases in the RRE discovery and BLogRBL algorithms.

📜 SIMILAR VOLUMES

Modern Data Mining Algorithms in C++ and

📁 Modern Data Mining Algorithms in C++ and CUDA C: Recent Developments in Feature Extraction and Selection Algorithms for Data Science

✍ Timothy Masters 📂 Library 📅 2020 🏛 Apress 🌐 English

<p>Discover a variety of data-mining algorithms that are useful for selecting small sets of important features from among unwieldy masses of candidates, or extracting useful features from measured variables. </p> <p>As a serious data miner you will often be faced with thousands of candidate features

Data mining. Theories, algorithms, and e

📁 Data mining. Theories, algorithms, and examples

✍ Ye, Nong 📂 Library 📅 2014 🏛 CRC Press 🌐 English

Data Mining : Theories, Algorithms, and

📁 Data Mining : Theories, Algorithms, and Examples

✍ Ye, Nong 📂 Library 📅 2013 🏛 CRC Press 🌐 English

Data Mining Algorithms in C++: Data Patt

📁 Data Mining Algorithms in C++: Data Patterns and Algorithms for Modern Applications

✍ Timothy Masters 📂 Library 📅 2018 🏛 Apress 🌐 English

Discover hidden relationships among the variables in your data, and learn how to exploit these relationships. This book presents a collection of data-mining algorithms that are effective in a wide variety of prediction and classification applications. All algorithms include an intuitive explanation

Data Mining Algorithms in C++. Data Patt

📁 Data Mining Algorithms in C++. Data Patterns and Algorithms for modern Applications

✍ Timothy Masters 📂 Library 📅 2018 🏛 Apress 🌐 English

Data Mining Algorithms in C++: Data Patt

📁 Data Mining Algorithms in C++: Data Patterns and Algorithms for Modern Applications

✍ Timothy Masters (auth.) 📂 Library 📅 2018 🏛 Apress 🌐 English

<p>Discover hidden relationships among the variables in your data, and learn how to exploit these relationships. This book presents a collection of data-mining algorithms that are effective in a wide variety of prediction and classification applications. All algorithms include an intuitive explanati