Natural Language Processing in the Real World: Text Processing, Analytics, and Classification (Chapman & Hall/CRC Data Science Series)

✍ Scribed by Jyotika Singh

Publisher: Chapman and Hall/CRC
Year: 2023
Tongue: English
Leaves: 393
Edition: 1
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Natural Language Processing in the Real World is a practical guide for applying data science and machine learning to build Natural Language Processing (NLP) solutions. Where traditional, academic-taught NLP is often accompanied by a data source or dataset to aid solution building, this book is situated in the real world where there may not be an existing rich dataset.

This book covers the basic concepts behind NLP and text processing and discusses the applications across 15 industry verticals. From data sources and extraction to transformation and modelling, and classic Machine Learning to Deep Learning and Transformers, several popular applications of NLP are discussed and implemented.

This book provides a hands-on and holistic guide for anyone looking to build NLP solutions, from students of Computer Science to those involved in large-scale industrial projects.

✦ Table of Contents

Cover
Half Title
Series Page
Title Page
Copyright Page
Dedication
Contents
List of Figures
List of Tables
Preface
Author Bio
Acknowledgments
SECTION I: NLP Concepts
CHAPTER 1: NLP Basics
1.1. NATURAL LANGUAGE PROCESSING
1.2. LANGUAGE CONCEPTS
1.2.1. Understanding language
1.2.2. Components of language
1.3. USING LANGUAGE AS DATA
1.3.1. Look-up
1.3.2. Linguistics
1.3.3. Data quantity and relevance
1.3.4. Preprocessing
1.3.5. Numerical representation
1.4. NLP CHALLENGES
1.4.1. Language diversity
1.4.1.1. Writing styles
1.4.1.2. Sentence ambiguities
1.4.1.3. Different languages
1.4.2. Language evolution
1.4.3. Context awareness
1.4.4. Not always a one-size-fits-all
1.5. SETUP
1.6. TOOLS
SECTION II: Data Curation
CHAPTER 2: Data Sources and Extraction
2.1. SOURCES OF DATA
2.1.1. Generated by businesses
2.1.2. Openly accessible
2.1.3. Conditionally available
2.2. DATA EXTRACTION
2.2.1. Reading from a PDF
2.2.2. Reading from a scanned document
2.2.3. Reading from a JSON
2.2.4. Reading from a CSV
2.2.5. Reading from HTML page (web scraping)
2.2.6. Reading from a Word document
2.2.7. Reading from APIs
2.2.8. Closing thoughts
2.3. DATA STORAGE
2.3.1. Flat-file database
2.3.2. Elasticsearch
2.3.2.1. Query examples
2.3.3. MongoDB
2.3.3.1. Query samples
2.3.4. Google BigQuery
2.3.4.1. Query examples
SECTION III: Data Processing and Modeling
CHAPTER 3: Data Preprocessing and Transformation
3.1. DATA CLEANING
3.1.1. Segmentation
3.1.2. Cleaning
3.1.3. Standardization
3.1.4. Example scenario
3.2. VISUALIZATION
3.3. DATA AUGMENTATION
3.4. DATA TRANSFORMATION
3.4.1. Encoding
3.4.2. Frequency-based vectorizers
3.4.3. Co-occurrence matrix
3.4.4. Word embeddings
CHAPTER 4: Data Modeling
4.1. DISTANCE METRICS
4.1.1. Character-based similarity
4.1.2. Phonetic matching
4.1.3. Semantic similarity metrics
4.2. MODELING
4.2.1. Classic ML models
4.2.1.1. Clustering
4.2.1.2. Classification
4.2.2. Deep learning
4.2.2.1. Convolutional neural network (CNN)
4.2.2.2. Recurrent neural network (RNN)
4.2.2.3. Long short term memory (LSTM)
4.2.2.4. Bi-directional LSTMs (BiLSTMs)
4.2.3. Transformers
4.2.3.1. Main innovations behind transformers
4.2.3.2. Types of transformer models
4.2.3.3. Using transformer models
4.2.4. Model hyperparameters
4.3. MODEL EVALUATION
4.3.1. Metrics
4.3.2. Hyperparameter tuning
SECTION IV: NLP Applications across Industry Verticals
CHAPTER 5: NLP Applications - Active Usage
5.1. SOCIAL MEDIA
5.1.1. What is social media?
5.1.2. Language data generated
5.1.3. NLP in social media
5.2. FINANCE
5.2.1. What is finance?
5.2.2. Language data generated
5.2.3. NLP in finance
5.3. E-COMMERCE
5.3.1. What is e-commerce?
5.3.2. Language data generated
5.3.3. NLP in e-commerce
5.4. TRAVEL AND HOSPITALITY
5.4.1. What is travel and hospitality?
5.4.2. Language data generated
5.4.3. NLP in travel and hospitality
5.5. MARKETING
5.5.1. What is marketing?
5.5.2. Language data generated
5.5.3. NLP in marketing
5.6. INSURANCE
5.6.1. What is insurance?
5.6.2. Language data generated
5.6.3. NLP in insurance
5.7. OTHER COMMON USE CASES
5.7.1. Writing and email
5.7.2. Home assistants
5.7.3. Recruiting
CHAPTER 6: NLP Applications - Developing Usage
6.1. HEALTHCARE
6.1.1. What is healthcare?
6.1.2. Language data generated
6.1.3. NLP in healthcare
6.2. LAW
6.2.1. What is law?
6.2.2. Language data generated
6.2.3. NLP in law
6.3. REAL ESTATE
6.3.1. What is real estate?
6.3.2. Language data generated
6.3.3. NLP in real estate
6.4. OIL AND GAS
6.4.1. What is oil and gas?
6.4.2. Language data generated
6.4.3. NLP in oil and gas
6.5. SUPPLY CHAIN
6.5.1. What is supply chain?
6.5.2. Language data generated
6.5.3. NLP in supply chain
6.6. TELECOMMUNICATION
6.6.1. What is telecom?
6.6.2. Language data generated
6.6.3. NLP in telecom
6.7. AUTOMOTIVE
6.7.1. What is automotive?
6.7.2. Language data generated
6.7.3. NLP in automotive
6.8. SERIOUS GAMES
6.8.1. What is a serious game?
6.8.2. Language data generated
6.8.3. NLP in serious games
6.9. EDUCATION AND RESEARCH
6.9.1. What is education and research?
6.9.2. Language data generated
6.9.3. NLP in education and research
SECTION V: Implementing Advanced NLP Applications
CHAPTER 7: Information Extraction and Text Transforming Models
7.1. INFORMATION EXTRACTION
7.1.1. Named entity recognition (NER)
7.1.1.1. Rule-based approaches
7.1.1.2. Open-source pre-trained models
7.1.1.3. Training your own model
7.1.1.4. Fine-tuning on custom datasets using transformers
7.1.2. Keyphrase extraction (KPE)
7.1.2.1. textacy
7.1.2.2. rake-nltk
7.1.2.3. KeyBERT
7.2. TEXT SUMMARIZATION
7.2.1. Extractive summarization
7.2.1.1. Classic open-source models
7.2.1.2. Transformers
7.2.2. Abstractive summarization
7.2.2.1. Transformers
7.3. LANGUAGE DETECTION AND TRANSLATION
7.3.1. Language detection
7.3.2. Machine translation
7.3.2.1. Paid services
7.3.2.2. Labeled open-source
7.3.2.3. Transformers
CHAPTER 8: Text Categorization and Affinities
8.1. TOPIC MODELING
8.1.1. Latent dirichlet allocation (LDA)
8.2. TEXT SIMILARITY
8.2.1. Elasticsearch
8.2.2. Classic TF-IDF approach
8.2.3. Pre-trained word embedding models
8.3. TEXT CLASSIFICATION
8.3.1. Off-the-shelf content classifiers
8.3.1.1. Zero-shot classification
8.3.2. Classifying with available labeled data
8.3.2.1. Classic ML
8.3.2.2. Deep learning
8.3.3. Classifying unlabeled data
8.3.3.1. Solution 1: Labeling
8.3.3.2. Solution 2: Clustering
8.3.3.3. Solution 3: Hybrid approach
8.4. SENTIMENT ANALYSIS
8.4.1. Classic open-source models
8.4.2. Transformers
8.4.3. Paid services
SECTION VI: Implementing NLP Projects in the Real-World
CHAPTER 9: Chatbots
9.1. TYPES OF CHATBOTS
9.2. COMPONENTS OF A CHATBOT
9.3. BUILDING A RULE-BASED CHATBOT
9.4. BUILDING A GOAL-ORIENTED CHATBOT
9.4.1. Chatbots using service providers
9.4.2. Create your own chatbot
9.4.3. Using RASA
9.5. CLOSING THOUGHTS
CHAPTER 10: Customer Review Analysis
10.1. HOTEL REVIEW ANALYSIS
10.1.1. Sentiment analysis
10.1.2. Extracting comment topic themes
10.1.3. Unlabeled comment classification into categories
CHAPTER 11: Recommendations and Predictions
11.1. CONTENT RECOMMENDATION SYSTEM
11.1.1. Approaches
11.1.2. Building a social media post recommendation system
11.1.2.1. Evaluating a classic TF-IDF method, spaCy model, and BERT model
11.1.3. Conclusion and closing thoughts
11.2. NEXT-WORD PREDICTION
11.2.1. Building a next-word prediction for the data science topic
11.2.1.1. Training a BiLSTM model
CHAPTER 12: More Real-World Scenarios and Tips
12.1. DATA SCENARIOS
12.2. MODELING SCENARIOS
12.3. DEPLOYING YOUR MODEL
12.4. MODEL AND OUTCOME EXPLAINABILITY
Bibliography
Index

📜 SIMILAR VOLUMES

Time Series for Data Science: Analysis a

📁 Time Series for Data Science: Analysis and Forecasting (Chapman & Hall/CRC Texts in Statistical Science)

✍ Wayne A. Woodward, Bivin Philip Sadler, Stephen Robertson 📂 Library 📅 2022 🏛 Chapman and Hall/CRC 🌐 English

Data Science students and practitioners want to find a forecast that “works” and don’t want to be constrained to a single forecasting strategy, Time Series for Data Science: Analysis and Forecasting discusses techniques of ensemble modelling for combining informati

Chapman & Hall/CRC Big Data Series : Big

📁 Chapman & Hall/CRC Big Data Series : Big Data Management and Processing (1)

✍ Kuan-Ching Li, Hai Jiang, Albert Y. Zomaya 📂 Library 📅 2017 🏛 Chapman and Hall/CRC 🌐 English

Big Data Management and Processing explores a range of big data related issues and their impact on the design of new computing systems. The twenty-one chapters were carefully selected and feature contributions from several outstanding researchers. The book endeavors to strike a balance between theor

Supervised Machine Learning for Text Ana

📁 Supervised Machine Learning for Text Analysis in R (Chapman & Hall/CRC Data Science Series)

✍ Emil Hvitfeldt, Julia Silge 📂 Library 📅 2021 🏛 Chapman and Hall/CRC 🌐 English

Supervised Machine Learning for Text Ana

📁 Supervised Machine Learning for Text Analysis in R (Chapman & Hall/CRC Data Science Series)

✍ Emil Hvitfeldt, Julia Silge 📂 Library 📅 2021 🏛 Chapman and Hall/CRC 🌐 English

Text data is important for many domains, from healthcare to marketing to the digital humanities, but specialized approaches are necessary to create features for machine learning from language. Supervised Machine Learning for Text Analysis in R explains how to preprocess text data for modeling, tr

Applied Categorical and Count Data Analy

📁 Applied Categorical and Count Data Analysis (Chapman & Hall/CRC Texts in Statistical Science)

✍ Wan Tang, Hua He, Xin M. Tu 📂 Library 🏛 Chapman and Hall/CRC 🌐 English

Developed from the authors’ graduate-level biostatistics course, Applied Categorical and Count Data Analysis, Second Edition explains how to perform the statistical analysis of discrete data, including categorical and count outcomes. The authors have been teaching

Handbook of Natural Language Processing,

📁 Handbook of Natural Language Processing, Second Edition (Chapman & Hall CRC Machine Learning & Pattern Recognition Series)

✍ Nitin Indurkhya, Fred J. Damerau (Editors) 📂 Library 📅 2010 🏛 Taylor and Francis Group, LLC 🌐 English