Applied Text Mining

✍ Scribed by Usman Qamar, Muhammad Summair Raza

Publisher: Springer
Year: 2024
Tongue: English
Leaves: 505
Edition: 2024
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

This textbook covers the concepts, theories, and implementations of text mining and natural language processing (NLP). It covers both the theory and the practical implementation, and every concept is explained with simple and easy-to-understand examples.

It consists of three parts. In Part 1 which consists of three chapters details about basic concepts and applications of text mining are provided, including eg sentiment analysis and opinion mining. It builds a strong foundation for the reader in order to understand the remaining parts. In the five chapters of Part 2, all the core concepts of text analytics like feature engineering, text classification, text clustering, text summarization, topic mapping, and text visualization are covered. Finally, in Part 3 there are three chapters covering deep-learning-based text mining, which is the dominating method applied to practically all text mining tasks nowadays. Various deep learning approaches to text mining are covered, includingmodels for processing and parsing text, for lexical analysis, and for machine translation. All three parts include large parts of Python code that shows the implementation of the described concepts and approaches.

The textbook was specifically written to enable the teaching of both basic and advanced concepts from one single book. The implementation of every text mining task is carefully explained, based Python as the programming language and Spacy and NLTK as Natural Language Processing libraries. The book is suitable for both undergraduate and graduate students in computer science and engineering.

✦ Table of Contents

Foreword
Preface
Organization of the Book
To the Instructor
To the Student
To the Professional
Contents
About the Authors
Part I: Text Mining Basics
1: Introduction to Text Mining
1.1 Textual Data and Its Components
1.1.1 Components of Textual Data
1.1.2 Formats of Textual Data
1.2 Sources of Textual Data
1.3 Text Mining
1.4 Core Text Mining Operations
1.4.1 Distribution
1.4.2 Frequent Concept Sets
1.4.3 Associations
1.5 Challenges of Text Mining
1.6 Text Indexing Process
1.6.1 Tokenization
1.6.2 Stemming
1.6.3 Stop-Word Removal
1.6.4 Term Weighting
1.7 Text Information System and Its Functions
1.7.1 Information Access
1.7.2 Knowledge Acquisition
1.7.3 Text Organization
1.8 Conceptual Framework for Text Information Systems
1.9 Text Patterns
1.10 Documents and Corpus
1.10.1 Processing Documents
1.10.2 Corpus as Baseline
1.11 Regular Expression
1.12 Summary
1.13 Exercises
2: Text Processing
2.1 Natural Language
2.1.1 What Is Natural Language
2.1.2 The Philosophy of Language
2.1.3 Language Acquisition and Usage
2.2 Linguistics
2.2.1 Language Syntax and Structure
2.2.2 Words
2.2.3 Phrases
2.2.4 Clauses
2.2.5 Grammar
2.2.6 Word-Order Typology
2.3 Language Semantics
2.3.1 Lexical Semantic Relations
2.3.2 Semantic Networks and Models
2.3.3 Representation of Semantics
2.4 Text Corpora
2.4.1 Corpora Annotation and Utilities
2.4.2 Popular Corpora
2.4.3 Accessing Text Corpora
2.5 Text Preprocessing
2.5.1 Sentence Segmentation
2.5.2 Word Tokenization
2.5.3 POS Tagging
2.5.4 Named Entity Recognition
2.6 Sentence Structure
2.7 Information Extraction from Text
2.8 Architecture of Information Extraction System
2.8.1 Tokenization
2.8.2 Morphological and Lexical Analysis
2.8.3 Syntactic Analysis
2.8.4 Domain Analysis
2.9 Summary
2.10 Exercises
3: Text Mining Applications
3.1 Sentiment Analysis
3.1.1 Sentiment Analysis Applications
3.1.2 Sentiment Analysis Problems
3.1.3 Opinion Summarization
3.1.4 Opinion Types
3.2 Sentiment Classification
3.2.1 Supervised Sentiment Classification
3.2.2 Unsupervised Sentiment Classification
3.3 Aspect-Based Sentiment Analysis
3.3.1 Aspect-Based Sentiment Classification
3.3.2 Aspect Extraction
3.3.3 Aspect Categories
3.3.4 Word Sense Disambiguation
3.4 Opinion Summarization
3.4.1 Aspect-Based Opinion Summarization
3.4.2 Contrastive View Summarization
3.4.3 Traditional Summarization
3.5 Analysis of Comparative Opinions
3.6 Opinion Search and Retrieval
3.7 Opinion Spam Detection
3.7.1 Types of Spam
3.7.2 Supervised Spam Detection
3.7.3 Unsupervised Spam Detection
3.8 Summary
3.9 Exercises
Part II: Text Analytics
4: Feature Engineering for Text Representations
4.1 Introduction to Features
4.2 Feature Engineering
4.3 Traditional Feature Engineering Models
4.3.1 Bag-of-Words Model
4.3.2 Bag-of-N-Grams Model
4.3.3 TF-IDF Model
4.3.4 Extracting Features for New Documents
4.3.5 Document Similarity
4.3.6 Topic Models
4.4 Advanced Feature Engineering Models
4.4.1 Loading the Bible Corpus
4.4.2 Word2Vec Model
4.4.3 Robust Word2Vec Models with Gensim
4.4.4 Applying Word2Vec Features for Machine Learning Tasks
4.4.5 The GloVe Model
4.4.6 Applying GloVe Features for Machine Learning Tasks
4.4.7 The FastText Model
4.4.8 Applying FastText Features to Machine Learning Tasks
4.5 Summary
4.6 Exercises
5: Text Classification
5.1 What Is Text Classification?
5.2 Automated Text Classification
5.3 Text Classification Blueprint
5.4 Data Retrieval
5.5 Data Preprocessing and Normalization
5.6 Training and Test Datasets
5.7 Feature Engineering Techniques
5.7.1 Traditional Feature Engineering Models
5.7.2 Advanced Feature Engineering Models
5.8 Classification Models
5.8.1 Multinomial Naive Bayes
5.8.2 Logistic Regression
5.8.3 Support Vector Machines
5.8.4 Ensemble Models
5.8.5 Random Forest
5.8.6 Gradient Boosting Machines
5.9 Evaluating Classification Models
5.10 Building and Evaluating Text Classifier
5.11 Applications
5.12 Summary
5.13 Exercises
6: Text Clustering
6.1 Introduction to Text Clustering
6.1.1 K-Means Clustering
6.1.2 Hierarchical Clustering
6.1.3 Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
6.1.4 Latent Dirichlet Allocation (LDA)
6.2 Clustering Types
6.2.1 Static Clustering and Dynamic Clustering
6.2.2 Crisp Clustering and Fuzzy Clustering
6.2.3 Flat Clustering and Hierarchical Clustering
6.2.4 Single-Viewed Clustering and Multiple-Viewed Clustering
6.3 Derived Tasks from Text Clustering
6.4 Text Clustering Algorithms
6.4.1 Simple Clustering Algorithms
6.4.2 K-Means Algorithm
6.4.3 Competitive Learning
6.5 Implementation of Text Clustering
6.6 Clustering Evaluation
6.6.1 Clustering Evaluation
6.6.2 Cluster Validation
6.6.3 Clustering Indexes
6.6.4 Parameter Tuning
6.7 Summary
6.8 Exercises
7: Text Summarization and Topic Modeling
7.1 Introduction to Text Summarization
7.2 Summarization Types
7.2.1 Manual Versus Automatic Text Summarization
7.2.2 Single Versus Multiple Text Summarization
7.2.3 Flat Versus Hierarchical Text Summarization
7.2.4 Abstraction-Versus Query-Based Summarization
7.3 Approaches to Text Summarization
7.3.1 Heuristics-Based Approaches
7.3.2 Mapping Summarization as a Classification Task
7.3.3 Sampling Schemes
7.4 Important Concepts
7.4.1 Documents
7.4.2 Text Normalization
7.4.3 Feature Extraction
7.4.4 Feature Matrix
7.4.5 Singular Value Decomposition
7.4.6 Text Normalization
7.4.7 Feature Extraction
7.5 Keyphrase Extraction
7.5.1 Collocations
7.5.2 Weighted Tag-Based Phrase Extraction
7.6 Topic Modeling and Its Objectives
7.6.1 Latent Semantic Indexing
7.6.2 Latent Dirichlet Allocation
7.6.3 Non-negative Matrix Factorization
7.7 Modeling Case Study
7.7.1 Topic Modeling Using Gensim
7.7.2 Topic Modeling Using Scikit-Learn
7.8 Automated Document Summarization
7.8.1 Text Wrangling
7.8.2 Text Representation with Feature Engineering
7.8.3 Latent Semantic Analysis
7.9 Challenges of Text Summarization
7.10 Summary
7.11 Exercises
8: Taxonomy Generation and Dynamic Document Organization
8.1 Introduction to Taxonomy Generation
8.2 Taxonomy Generation Tasks
8.2.1 Keyword Extraction
8.2.2 Word Categorization
8.2.3 Word Clustering
8.2.4 Topic Routing
8.3 Taxonomy Generation Schemes
8.3.1 Index-Based Scheme
8.3.2 Clustering-Based Scheme
8.3.3 Association-Based Scheme
8.3.4 Link Analysis-Based Scheme
8.4 Taxonomy Governance
8.4.1 Taxonomy Maintenance
8.4.2 Taxonomy Growth
8.4.3 Taxonomy Integration
8.4.4 Ontology
8.5 Dynamic Document Organization
8.6 Online Clustering
8.7 Online Clustering Algorithms
8.7.1 Online Clustering in Conceptual and Functional View
8.7.2 Online K-Means Algorithms
8.7.3 Online Unsupervised K-Nearest Neighbors Algorithms
8.7.4 Fuzzy Clustering
8.8 Dynamic Organization
8.8.1 Execution Process
8.8.2 Maintenance Mode
8.8.3 Creation Mode
8.8.4 Additional Tasks
8.9 Challenges of Dynamic Document Organization
8.9.1 Text Representation
8.9.2 Binary Decomposition
8.9.3 DDO System Variants
8.10 Summary
8.11 Exercises
9: Visualization Approaches
9.1 Introduction and Importance of Text Visualization
9.2 Visualization Layer in the Text Mining System
9.3 Concept Graphs
9.3.1 Simple Concept Graphs
9.3.2 Simple Concept Set Graphs
9.3.3 Simple Concept Association Graphs
9.4 Histograms
9.5 Line Graphs
9.6 Circle Graphs
9.7 Category Connecting Maps
9.8 Self-Organizing Maps (SOMs)
9.9 Hyperbolic Trees
9.10 Summary
9.11 Exercises
Part III: Deep Learning in Text Mining
10: Text Mining Through Deep Learning
10.1 Role of Deep Learning in Text Mining
10.2 Deep Learning Models for Processing Text
10.2.1 Feed-Forward Neural Networks
10.2.2 Convolutional Neural Networks
10.2.3 Multi-layer Perceptron (MLP)
10.2.3.1 Regression MLPs
10.2.3.2 Classification MLPs
10.2.3.3 Setting Up MLPs with Keras
10.2.4 Recurrent Neural Networks
10.2.4.1 Memory Cells
10.2.4.2 Input-Output Sequences
10.2.4.3 RNNs Training
10.2.4.4 RNN Implementation
10.2.5 Long Short-Term Memory
10.2.5.1 LSTM Architecture
10.2.5.2 LSTM in Text Mining
10.2.5.3 Key Applications of LSTM in Text Mining
10.2.5.4 Python Implementation
10.2.6 Transformers
10.2.6.1 Transformers in Text Mining
10.2.6.2 Basic Architecture
10.2.6.3 Self-Attention and Multi-head Attention
10.2.6.4 BERT (Bidirectional Encoder Representations from Transformers)
10.2.6.5 Handling Long Sequences with Transformers
10.2.6.6 Applications of Transformers in Text Mining
10.3 Deep Learning in Sentiment Analysis
10.3.1 Neural Networks
10.3.2 Word Embedding
10.3.3 Sentiment Analysis Tasks
10.3.4 Neural Network Architectures for Sentiment Analysis
10.4 ChatGPT
10.4.1 Foundation of ChatGPT
10.4.2 Ethical and Societal Considerations
10.5 Summary
10.6 Exercises
11: Lexical Analysis and Parsing Using Deep Learning
11.1 Introduction to Lexical Analysis and Parsing Using Deep Learning
11.1.1 Word Segmentation
11.1.2 Syntactic Parsing
11.1.3 Structured Prediction
11.1.4 Advantages and Disadvantages of Conventional Lexical Analysis Techniques
11.2 Conventional Lexical Analysis Case Study
11.3 Structured Prediction Methods
11.3.1 Graph-Based Methods
11.3.2 Transition-Based Methods
11.4 Neural Graph-Based Methods
11.4.1 Neural Conditional Random Fields
11.4.2 Neural Graph-Based Dependency Parsing
11.5 Neural Transition-Based Methods
11.5.1 Neural Greedy Shift-Reduce Dependency Parsing
11.5.2 Neural Greedy Sequence Labeling
11.5.3 Globally Optimized Models
11.6 Deep Learning-Based Lexical Analysis Case Study
11.7 Advantages and Disadvantages of Deep Learning-Based Lexical Analysis Techniques
11.8 Summary
11.9 Exercises
12: Machine Translation Using Deep Learning
12.1 Machine Translation
12.2 Ambiguity
12.2.1 Word Translation Problems
12.2.2 Phrase Translation Problems
12.2.3 Syntactic Translation Problems
12.2.4 Semantic Translation Problems
12.3 Practical Issues
12.3.1 Data Availability
12.3.2 Evaluation Campaign
12.4 Applications of Machine Translation
12.4.1 Information Access
12.4.2 Aiding Human Translators
12.4.3 Communication
12.4.4 Natural Language Processing Pipelines
12.5 Machine Translation Approaches
12.6 Introduction to Deep Learning Techniques for MT
12.7 Component-Wise Deep Learning for Machine Translation
12.7.1 Translation Models
12.7.2 Reordering Models
12.7.3 Language Models
12.8 End-to-End Deep Learning Models for Machine Translation
12.8.1 Sequence-to-Sequence Neural Network
12.8.2 Encoder Network
12.8.3 Decoder Network
12.9 Translation of Highly Repetitive Content
12.10 Translation of User-Generated Content
12.11 Online Customer Service
12.12 Summary
12.13 Exercise

📜 SIMILAR VOLUMES

Text Mining: Applications and Theory

📁 Text Mining: Applications and Theory

✍ Michael W. Berry, Jacob Kogan 📂 Library 📅 2010 🏛 Wiley 🌐 English

Text Mining: Applications and Theory presents the state-of-the-art algorithms for text mining from both the academic and industrial perspectives. The contributors span several countries and scientific domains: universities, industrial corporations, and government laboratories, and demonstrate the

Text mining : applications and theory

📁 Text mining : applications and theory

✍ Kogan, Jacob; Berry, Michael W 📂 Library 📅 2010 🏛 John Wiley & Sons 🌐 English

Text Mining: Applications and Theory presents the state-of-the-art algorithms for text mining from both the academic and industrial perspectives. The contributors span several countries and scientific domains: universities, industrial corporations, and government laboratories, and demonstrat

Text Mining: Classification, Clustering,

📁 Text Mining: Classification, Clustering, and Applications

✍ Ashok Srivastava, Mehran Sahami 📂 Library 📅 2009 🏛 Chapman & Hall/CRC 🌐 English

The Definitive Resource on Text Mining Theory and Applications from Foremost Researchers in the Field Giving a broad perspective of the field from numerous vantage points, Text Mining: Classification, Clustering, and Applications focuses on statistical methods for text mining and analysis. It exami

Text Mining: Classification, Clustering,

📁 Text Mining: Classification, Clustering, and Applications

✍ Ashok N. Srivastava, Mehran Sahami 📂 Library 📅 2009 🏛 Chapman and Hall/CRC 🌐 English

Text Mining: Classification, Clustering,

📁 Text Mining: Classification, Clustering, and Applications

✍ Ashok Srivastava, Mehran Sahami 📂 Library 📅 2009 🏛 Chapman & Hall 🌐 English

The Definitive Resource on Text Mining Theory and Applications from Foremost Researchers in the Field Giving a broad perspective of the field from numerous vantage points, Text Mining: Classification, Clustering, and Applications focuses on statistical methods for

Text Mining: From Ontology Learning to A

📁 Text Mining: From Ontology Learning to Automated Text Processing Applications

✍ Chris Biemann, Alexander Mehler (eds.) 📂 Library 📅 2014 🏛 Springer International Publishing 🌐 English

This book comprises a set of articles that specify the methodology of text mining, describe the creation of lexical resources in the framework of text mining and use text mining for various tasks in natural language processing (NLP). The analysis of large amounts of textual data is a prerequis