𝔖 Scriptorium
✦   LIBER   ✦

📁

Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python

✍ Scribed by Akshay Kulkarni, Adarsha Shivananda


Publisher
Apress
Year
2021
Tongue
English
Leaves
302
Edition
2
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Focus on implementing end-to-end projects using Python and leverage state-of-the-art algorithms. This book teaches you to efficiently use a wide range of natural language processing (NLP) packages to: implement text classification, identify parts of speech, utilize topic modeling, text summarization, sentiment analysis, information retrieval, and many more applications of NLP. 

The book begins with text data collection, web scraping, and the different types of data sources. It explains how to clean and pre-process text data, and offers ways to analyze data with advanced algorithms. You then explore semantic and syntactic analysis of the text. Complex NLP solutions that involve text normalization are covered along with advanced pre-processing methods, POS tagging, parsing, text summarization, sentiment analysis, word2vec, seq2seq, and much more. The book presents the fundamentals necessary for applications of machine learning and deep learning in NLP. This second edition goes over advanced techniques to convert text to features such as Glove, Elmo, Bert, etc. It also includes an understanding of how transformers work, taking sentence BERT and GPT as examples. The final chapters explain advanced industrial applications of NLP with solution implementation and leveraging the power of deep learning techniques for NLP problems. It also employs state-of-the-art advanced RNNs, such as long short-term memory, to solve complex text generation tasks. 

After reading this book, you will have a clear understanding of the challenges faced by different industries and you will have worked on multiple examples of implementing NLP in the real world.



What You Will Learn
  • Know the core concepts of implementing NLP and various approaches to natural language processing (NLP), including NLP using Python libraries such as NLTK, textblob, SpaCy, Standford CoreNLP, and more
  • Implement text pre-processing and feature engineering in NLP, including advanced methods of feature engineering
  • Understand and implement the concepts of information retrieval, text summarization, sentiment analysis, text classification, and other advanced NLP techniques leveraging machine learning and deep learning


Who This Book Is For

Data scientists who want to refresh and learn various concepts of natural language processing (NLP) through coding exercises

✦ Table of Contents


Table of Contents
About the Authors
About the Technical Reviewer
Acknowledgments
Introduction
Chapter 1: Extracting the Data
Introduction
Client Data
Free Sources
Web Scraping
Recipe 1-1. Collecting Data
Problem
Solution
How It Works
Step 1-1. Log in to the Twitter developer portal
Step 1-2. Execute query in Python
Recipe 1-2. Collecting Data from PDFs
Problem
Solution
How It Works
Step 2-1. Install and import all the necessary libraries
Step 2-2. Extract text from a PDF file
Recipe 1-3. Collecting Data from Word Files
Problem
Solution
How It Works
Step 3-1. Install and import all the necessary libraries
Step 3-2. Extract text from a Word file
Recipe 1-4. Collecting Data from JSON
Problem
Solution
How It Works
Step 4-1. Install and import all the necessary libraries
Step 4-2. Extract text from a JSON file
Recipe 1-5. Collecting Data from HTML
Problem
Solution
How It Works
Step 5-1. Install and import all the necessary libraries
Step 5-2. Fetch the HTML file
Step 5-3. Parse the HTML file
Step 5-4. Extract a tag value
Step 5-5. Extract all instances of a particular tag
Step 5-6. Extract all text from a particular tag
Recipe 1-6. Parsing Text Using Regular Expressions
Problem
Solution
How It Works
Tokenizing
Extracting Email IDs
Replacing Email IDs
Extracting Data from an eBook and Performing regex
Recipe 1-7. Handling Strings
Problem
Solution
How It Works
Replacing Content
Concatenating Two Strings
Searching for a Substring in a String
Recipe 1-8. Scraping Text from the Web
Problem
Solution
How It Works
Step 8-1. Install all the necessary libraries
Step 8-2. Import the libraries
Step 8-3. Identify the URL to extract the data
Step 8-4. Request the URL and download the content using Beautiful Soup
Step 8-5. Understand the website’s structure to extract the required information
Step 8-6. Use Beautiful Soup to extract and parse the data from HTML tags
Step 8-7. Convert lists to a data frame and perform an analysis that meets business requirements
Step 8-8. Download the data frame
Chapter 2: Exploring and Processing Text Data
Recipe 2-1. Converting Text Data to Lowercase
Problem
Solution
How It Works
Step 1-1. Read/create the text data
Step 1-2. Execute the lower() function on the text data
Recipe 2-2. Removing Punctuation
Problem
Solution
How It Works
Step 2-1. Read/create the text data
Step 2-2. Execute the replace() function on the text data
Recipe 2-3. Removing Stop Words
Problem
Solution
How It Works
Step 3-1. Read/create the text data
Step 3-2. Remove punctuation from the text data
Recipe 2-4. Standardizing Text
Problem
Solution
How It Works
Step 4-1. Create a custom lookup dictionary
Step 4-2. Create a custom function for text standardization
Step 4-3. Run the text_std function
Recipe 2-5. Correcting Spelling
Problem
Solution
How It Works
Step 5-1. Read/create the text data
Step 5-2. Execute spelling correction on the text data
Recipe 2-6. Tokenizing Text
Problem
Solution
How It Works
Step 6-1. Read/create the text data
Step 6-2. Tokenize the text data
Recipe 2-7. Stemming
Problem
Solution
How It Works
Step 7-1. Read the text data
Step 7-2. Stem the text
Recipe 2-8. Lemmatizing
Problem
Solution
How It Works
Step 8-1. Read the text data
Step 8-2. Lemmatize the data
Recipe 2-9. Exploring Text Data
Problem
Solution
How It Works
Step 9-1. Read the text data
Step 9-2. Import necessary libraries
Step 9-3 Check the number of words in the data
Step 9-4. Compute the frequency of all words in the reviews
Step 9-5. Consider words with length greater than 3 and plot
Step 9-6. Build a word cloud
Recipe 2-10. Dealing with Emojis and Emoticons
Problem
Solution
How It Works
Step 10-A1. Read the text data
Step 10-A2. Install and import necessary libraries
Step 10-A3. Write a function that coverts emojis into words
Step 10-A4. Pass text with an emoji to the function
Problem
Solution
How It Works
Step 10-B1. Read the text data
Step 10-B2. Install and import necessary libraries
Step 10-B3. Write a function to remove emojis
Step 10-B4. Pass text with an emoji to the function
Problem
Solution
How It Works
Step 10-C1. Read the text data
Step 10-C2. Install and import necessary libraries
Step 10-C3. Write function to convert emoticons into word
Step 10-C4. Pass text with emoticons to the function
Problem
Solution
How It Works
Step 10-D1 Read the text data
Step 10-D2. Install and import necessary libraries
Step 10-D3. Write function to remove emoticons
Step 10-D4. Pass text with emoticons to the function
Problem
Solution
How It Works
Step 10-E1. Read the text data
Step 10-E2. Install and import necessary libraries
Step 10-E3. Find all emojis and determine their meaning
Recipe 2-11. Building a Text Preprocessing Pipeline
Problem
Solution
How It Works
Step 11-1. Read/create the text data
Step 11-2. Process the text
Chapter 3: Converting Text to Features
Recipe 3-1. Converting Text to Features Using One-Hot Encoding
Problem
Solution
How It Works
Step 1-1. Store the text in a variable
Step 1-2. Execute a function on the text data
Recipe 3-2. Converting Text to Features Using a Count Vectorizer
Problem
Solution
How It Works
Recipe 3-3. Generating n-grams
Problem
Solution
How It Works
Step 3-1. Generate n-grams using TextBlob
Step 3-2. Generate bigram-based features for a document
Recipe 3-4. Generating a Co-occurrence Matrix
Problem
Solution
How It Works
Step 4-1. Import the necessary libraries
Step 4-2. Create function for a co-occurrence matrix
Step 4-3. Generate a co-occurrence matrix
Recipe 3-5. Hash Vectorizing
Problem
Solution
How It Works
Step 5-1. Import the necessary libraries and create a document
Step 5-2. Generate a hash vectorizer matrix
Recipe 3-6. Converting Text to Features Using TF-IDF
Problem
Solution
How It Works
Step 6-1. Read the text data
Step 6-2. Create the features
Recipe 3-7. Implementing Word Embeddings
Problem
Solution
How It Works
skip-gram
Continuous Bag of Words (CBOW)
Recipe 3-8. Implementing fastText
Problem
Solution
How It Works
Recipe 3-9. Converting Text to Features Using State-of-the-Art Embeddings
Problem
Solution
ELMo
Sentence Encoders
doc2vec
Sentence-BERT
Universal Encoder
InferSent
Open-AI GPT
How It Works
Step 9-1. Import a notebook and data to Google Colab
Step 9-2. Install and import libraries
Step 9-3. Read text data
Step 9-4. Process text data
Step 9-5. Generate a feature vector
Sentence-BERT
Universal Encoder
Infersent
Open-AI GPT
Step 9-6. Generate a feature vector function automatically using a selected embedding method
Chapter 4: Advanced Natural Language Processing
Recipe 4-1. Extracting Noun Phrases
Problem
Solution
How It Works
Recipe 4-2. Finding Similarity Between Texts
Solution
How It Works
Step 2-1. Create/read the text data
Step 2-2. Find similarities
Phonetic Matching
Recipe 4-3. Tagging Part of Speech
Problem
Solution
How It Works
Step 3-1. Store the text in a variable
Step 3-2. Import NLTK for POS
Recipe 4-4. Extracting Entities from Text
Problem
Solution
How It Works
Step 4-1. Read/create the text data
Step 4-2. Extract the entities
Using NLTK
Using spaCy
Recipe 4-5. Extracting Topics from Text
Problem
Solution
How It Works
Step 5-1. Create the text data
Step 5-2. Clean and preprocess the data
Step 5-3. Prepare the document term matrix
Step 5-4. Create the LDA model
Recipe 4-6. Classifying Text
Problem
Solution
How It Works
Step 6-1. Collect and understand the data
Step 6-2. Text processing and feature engineering
Step 6-3. Model training
Recipe 4-7. Carrying Out Sentiment Analysis
Problem
Solution
How It Works
Step 7-1. Create the sample data
Step 7-2. Clean and preprocess the data
Step 7-3. Get the sentiment scores
Recipe 4-8. Disambiguating Text
Problem
Solution
How It Works
Step 8-1. Import libraries
Step 8-2. Disambiguate word sense
Recipe 4-9. Converting Speech to Text
Problem
Solution
How It Works
Step 9-1. Define the business problem
Step 9-2. Install and import necessary libraries
Step 9-3. Run the code
Recipe 4-10. Converting Text to Speech
Problem
Solution
How It Works
Step 10-1. Install and import necessary libraries
Step 10-2. Run the code with the gTTs function
Recipe 4-11. Translating Speech
Problem
Solution
How It Works
Step 11-1. Install and import necessary libraries
Step 11-2. Input text
Step 11-3. Run the goslate function
Chapter 5: Implementing Industry Applications
Recipe 5-1. Implementing Multiclass Classification
Problem
Solution
How It Works
Step 1-1. Get the data from Kaggle
Step 1-2. Import the libraries
Step 1-3. Import the data
Step 1-4. Analyze the date
Step 1-5. Split the data
Step 1-6. Use TF-IDF for feature engineering
Step 1-7. Build the model and evaluate
Recipe 5-2. Implementing Sentiment Analysis
Problem
Solution
How It Works
Step 2-1. Define the business problem
Step 2-2. Identify potential data sources and extract insights
Step 2-3. Preprocess the data
Step 2-4. Analyze data
Step 2-5. Use a pre-trained model
Step 2-6. Do sentiment analysis
Step 2-7. Get business insights
Recipe 5-3. Applying Text Similarity Functions
Problem
Solution
How It Works
Step 3a-1. Read and understand the data
Step 3a-2. Extract a blocking key
Step 3a-3. Do similarity matching and scoring
Step 3a-4. Predict if records match using ECM classifier
Records of same customers from multiple tables
Step 3b-1. Read and understand the data
Step 3b-2. Block to reduce the comparison window and create record pairs
Step 3b-3. Do similarity matching
Step 3b-4. Predict if records match using ECM classifier
Recipe 5-4. Summarizing Text Data
Problem
Solution
How It Works
Step 4-1. Use TextRank
Step 4-2. Use feature-based text summarization
Recipe 5-5. Clustering Documents
Problem
Solution
How It Works
Step 5-1. Import data and libraries
Step 5-2. Preprocess and use TF-IDF feature engineering
Step 5-3. Cluster using k-means
Step 5-4. Identify cluster behavior
Step 5-5. Plot the clusters on a 2D graph
Recipe 5-6. NLP in a Search Engine
Problem
Solution
How It Works
Step 6-1. Preprocess
Step 6-2. Use the entity extraction model
Step 6-3. Do query enhancement/expansion
Step 6-4. Use a search platform
Step 6-5. Learn to rank
Recipe 5-7. Detecting Fake News
Problem
Solution
How It Works
Step 7-1. Collect data
Step 7-2. Install libraries
Step 7-3. Analyze the data
Step 7-4. Do exploratory data analysis
Step 7-5. Preprocess the data
Step 7-6. Use train_test_split
Step 7-7. Do feature engineering
Step 7-8. Build a model
Model Evaluation
Step 7-9. Tune hyperparameters
Step 7-10. Validate
Summary
Recipe 5-8. Movie Genre Tagging
Problem
Solution
Approach Flow
How It Works
Step 8-1. Collect data
Step 8-2. Install libraries
Step 8-3. Analyze the data
Step 8-4. Do exploratory data analysis
Step 8-5. Preprocess the data
Step 8-6. Use train_test_split
Step 8-7. Do feature engineering
Step 8-8. Do model building and prediction
Problem Transformation
Binary Relevance
Classifier Chains
Label Powerset
Adapted Algorithm
Chapter 6: Deep Learning for NLP
Introduction to Deep Learning
Convolutional Neural Networks
Data
Architecture
Convolution
Nonlinearity (ReLU)
Pooling
Flatten, Fully Connected, and Softmax Layers
Backpropagation: Training the Neural Network
Recurrent Neural Networks
Training RNN: Backpropagation Through Time (BPTT)
Long Short-Term Memory (LSTM)
Recipe 6-1. Retrieving Information
Problem
Solution
How It Works
Step 1-1. Import the libraries
Step 1-2. Create or import documents
Step 1-3. Download word2vec
Step 1-4. Create an IR system
Step 1-5. Results and applications
Recipe 6-2. Classifying Text with Deep Learning
Problem
Solution
How It Works
Step 2-1. Define the business problem
Step 2-2. Identify potential data sources and collect
Step 2-3. Preprocess text
Step 2-4. Prepare the data for model building
Step 2-5. Model building and predicting
Recipe 6-3. Next Word Prediction
Problem
Solution
How It Works
Step 3-1. Define the business problem
Step 3-2. Identify potential data sources and collect
Step 3-3. Import and install necessary libraries
Step 3-4. Process the data
Step 3-5. Prepare data for modeling
Step 3-6. Build the model
Step 3-7. Predict the next word
Recipe 6-4. Stack Overflow question recommendation
Problem
Solution
How It Works
Step 4-1. Collect data
Step 4-2. Import Notebook and data to Google Colab
Step 4-3. Import the libraries
Step 4-4. Import the data and EDA
Step 4-5. Clean the text data
Step 4-6. Use TFIDF for feature engineering
Step 4-7. Use GloVe embeddings for feature engineering
Step 4-8. Use GPT for feature engineering
Step 4-9. Use Sentence-BERT for feature engineering
Step 4-10. Create functions to fetch top questions
Step 4-11. Preprocess user input
Step 4-12. Find similar questions
Chapter 7: Conclusion and  Next-Gen NLP
Recipe 7-1. Recent advancements in text to features or distributed representations
Problem
Solution
Recipe 7-2. Advanced deep learning for NLP
Problem
Solution
Recursive Neural Networks
Deep Generative Models
Recipe 7-3. Reinforcement learning applications in NLP
Problem
Solution
Exploration vs. Exploitation Trade-off
Temporal Difference
Recipe 7-4. Transfer learning and pre-trained models
Problem
Solution
Why Do We Need to Transfer Learning NLP?
A New Era of Embeddings
ULMFiT: Transfer Learning in NLP
Transformers: Beyond LSTM
flair
Why BERT?
BERT and RNN
BERT vs. LSTM
BERT vs. OpenAI GPT
Recipe 7-5. Meta-learning in NLP
Problem
Solution
Recipe 7-6. Capsule networks for NLP
Problem
Solution
Multitasking in NLP
Index


📜 SIMILAR VOLUMES


Natural Language Processing Recipes: Unl
✍ Akshay Kulkarni, Adarsha Shivananda 📂 Library 📅 2021 🏛 Apress 🌐 English

<div><div><div>Focus on implementing end-to-end projects using Python and leverage state-of-the-art algorithms. This book teaches you to efficiently use a wide range of natural language processing (NLP) packages to: implement text classification, identify parts of speech, utilize topic modeling, tex

Natural Language Processing Recipes: Unl
✍ Akshay Kulkarni, Adarsha Shivananda 📂 Library 📅 2021 🏛 Apress 🌐 English

<div><div><div>Focus on implementing end-to-end projects using Python and leverage state-of-the-art algorithms. This book teaches you to efficiently use a wide range of natural language processing (NLP) packages to: implement text classification, identify parts of speech, utilize topic modeling, tex

Natural Language Processing Recipes: Unl
✍ Akshay Kulkarni, Adarsha Shivananda 📂 Library 📅 2019 🏛 Apress 🌐 English

Implement natural language processing applications with Python using a problem-solution approach. This book has numerous coding exercises that will help you to quickly deploy natural language processing techniques, such as text classification, parts of speech identification, topic modeling, text sum

Applied Natural Language Processing with
✍ Taweh Beysolow II 📂 Library 📅 2018 🏛 Apress 🌐 English

Learn to harness the power of AI for natural language processing, performing tasks such as spell check, text summarization, document classification, and natural language generation. Along the way, you will learn the skills to implement these methods in larger infrastructures to replace existing code

Applied Natural Language Processing with
✍ Taweh Beysolow II 📂 Library 📅 2018 🏛 Apress 🌐 English

Learn to harness the power of AI for natural language processing, performing tasks such as spell check, text summarization, document classification, and natural language generation. Along the way, you will learn the skills to implement these methods in larger infrastructures to replace existing code