Transformers for natural language processing: second edition

Tongue: English
Leaves: 565
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

I was looking for this book badly, so I understand the need. Please read with your heart and support him sir 'Denis Rothman' by becoming something.

✦ Table of Contents

Cover
Copyright
Foreword
Contributors
Table of Contents
Preface
Chapter 1: What are Transformers?
The ecosystem of transformers
Industry 4.0
Foundation models
Is programming becoming a sub-domain of NLP?
The future of artificial intelligence specialists
Optimizing NLP models with transformers
The background of transformers
What resources should we use?
The rise of Transformer 4.0 seamless APIs
Choosing ready-to-use API-driven libraries
Choosing a Transformer Model
The role of Industry 4.0 artificial intelligence specialists
Summary
Questions
References
Chapter 2: Getting Started with the Architecture of the Transformer Model
The rise of the Transformer: Attention is All You Need
The encoder stack
Input embedding
Positional encoding
Sublayer 1: Multi-head attention
Sublayer 2: Feedforward network
The decoder stack
Output embedding and position encoding
The attention layers
The FFN sublayer, the post-LN, and the linear layer
Training and performance
Tranformer models in Hugging Face
Summary
Questions
References
Chapter 3: Fine-Tuning BERT Models
The architecture of BERT
The encoder stack
Preparing the pretraining input environment
Pretraining and fine-tuning a BERT model
Fine-tuning BERT
Hardware constraints
Installing the Hugging Face PyTorch interface for BERT
Importing the modules
Specifying CUDA as the device for torch
Loading the dataset
Creating sentences, label lists, and adding BERT tokens
Activating the BERT tokenizer
Processing the data
Creating attention masks
Splitting the data into training and validation sets
Converting all the data into torch tensors
Selecting a batch size and creating an iterator
BERT model configuration
Loading the Hugging Face BERT uncased base model
Optimizer grouped parameters
The hyperparameters for the training loop
The training loop
Training evaluation
Predicting and evaluating using the holdout dataset
Evaluating using the Matthews Correlation Coefficient
The scores of individual batches
Matthews evaluation for the whole dataset
Summary
Questions
References
Chapter 4: Pretraining a RoBERTa Model from Scratch
Training a tokenizer and pretraining a transformer
Building KantaiBERT from scratch
Step 1: Loading the dataset
Step 2: Installing Hugging Face transformers
Step 3: Training a tokenizer
Step 4: Saving the files to disk
Step 5: Loading the trained tokenizer files
Step 6: Checking resource constraints: GPU and CUDA
Step 7: Defining the configuration of the model
Step 8: Reloading the tokenizer in transformers
Step 9: Initializing a model from scratch
Exploring the parameters
Step 10: Building the dataset
Step 11: Defining a data collator
Step 12: Initializing the trainer
Step 13: Pretraining the model
Step 14: Saving the final model (+tokenizer + config) to disk
Step 15: Language modeling with FillMaskPipeline
Next steps
Summary
Questions
References
Chapter 5: Downstream NLP Tasks with Transformers
Transduction and the inductive inheritance of transformers
The human intelligence stack
The machine intelligence stack
Transformer performances versus Human Baselines
Evaluating models with metrics
Accuracy score
F1-score
Matthews Correlation Coefficient (MCC)
Benchmark tasks and datasets
From GLUE to SuperGLUE
Introducing higher Human Baselines standards
The SuperGLUE evaluation process
Defining the SuperGLUE benchmark tasks
BoolQ
Commitment Bank (CB)
Multi-Sentence Reading Comprehension (MultiRC)
Reading Comprehension with Commonsense Reasoning Dataset (ReCoRD)
Recognizing Textual Entailment (RTE)
Words in Context (WiC)
The Winograd schema challenge (WSC)
Running downstream tasks
The Corpus of Linguistic Acceptability (CoLA)
Stanford Sentiment TreeBank (SST-2)
Microsoft Research Paraphrase Corpus (MRPC)
Winograd schemas
Summary
Questions
References
Chapter 6: Machine Translation with the Transformer
Defining machine translation
Human transductions and translations
Machine transductions and translations
Preprocessing a WMT dataset
Preprocessing the raw data
Finalizing the preprocessing of the datasets
Evaluating machine translation with BLEU
Geometric evaluations
Applying a smoothing technique
Chencherry smoothing
Translation with Google Translate
Translations with Trax
Installing Trax
Creating the original Transformer model
Initializing the model using pretrained weights
Tokenizing a sentence
Decoding from the Transformer
De-tokenizing and displaying the translation
Summary
Questions
References
Chapter 7: The Rise of Suprahuman Transformers with GPT-3 Engines
Suprahuman NLP with GPT-3 transformer models
The architecture of OpenAI GPT transformer models
The rise of billion-parameter transformer models
The increasing size of transformer models
Context size and maximum path length
From fine-tuning to zero-shot models
Stacking decoder layers
GPT-3 engines
Generic text completion with GPT-2
Step 9: Interacting with GPT-2
Training a custom GPT-2 language model
Step 12: Interactive context and completion examples
Running OpenAI GPT-3 tasks
Running NLP tasks online
Getting started with GPT-3 engines
Running our first NLP task with GPT-3
NLP tasks and examples
Comparing the output of GPT-2 and GPT-3
Fine-tuning GPT-3
Preparing the data
Step 1: Installing OpenAI
Step 2: Entering the API key
Step 3: Activating OpenAI’s data preparation module
Fine-tuning GPT-3
Step 4: Creating an OS environment
Step 5: Fine-tuning OpenAI’s Ada engine
Step 6: Interacting with the fine-tuned model
The role of an Industry 4.0 AI specialist
Initial conclusions
Summary
Questions
References
Chapter 8: Applying Transformers to Legal and Financial Documents for AI Text Summarization
Designing a universal text-to-text model
The rise of text-to-text transformer models
A prefix instead of task-specific formats
The T5 model
Text summarization with T5
Hugging Face
Hugging Face transformer resources
Initializing the T5-large transformer model
Getting started with T5
Exploring the architecture of the T5 model
Summarizing documents with T5-large
Creating a summarization function
A general topic sample
The Bill of Rights sample
A corporate law sample
Summarization with GPT-3
Summary
Questions
References
Chapter 9: Matching Tokenizers and Datasets
Matching datasets and tokenizers
Best practices
Step 1: Preprocessing
Step 2: Quality control
Continuous human quality control
Word2Vec tokenization
Case 0: Words in the dataset and the dictionary
Case 1: Words not in the dataset or the dictionary
Case 2: Noisy relationships
Case 3: Words in the text but not in the dictionary
Case 4: Rare words
Case 5: Replacing rare words
Case 6: Entailment
Standard NLP tasks with specific vocabulary
Generating unconditional samples with GPT-2
Generating trained conditional samples
Controlling tokenized data
Exploring the scope of GPT-3
Summary
Questions
References
Chapter 10: Semantic Role Labeling with BERT-Based Transformers
Getting started with SRL
Defining semantic role labeling
Visualizing SRL
Running a pretrained BERT-based model
The architecture of the BERT-based model
Setting up the BERT SRL environment
SRL experiments with the BERT-based model
Basic samples
Sample 1
Sample 2
Sample 3
Difficult samples
Sample 4
Sample 5
Sample 6
Questioning the scope of SRL
The limit of predicate analysis
Redefining SRL
Summary
Questions
References
Chapter 11: Let Your Data Do the Talking: Story, Questions, and Answers
Methodology
Transformers and methods
Method 0: Trial and error
Method 1: NER first
Using NER to find questions
Location entity questions
Person entity questions
Method 2: SRL first
Question-answering with ELECTRA
Project management constraints
Using SRL to find questions
Next steps
Exploring Haystack with a RoBERTa model
Exploring Q&A with a GTP-3 engine
Summary
Questions
References
Chapter 12: Detecting Customer Emotions to Make Predictions
Getting started: Sentiment analysis transformers
The Stanford Sentiment Treebank (SST)
Sentiment analysis with RoBERTa-large
Predicting customer behavior with sentiment analysis
Sentiment analysis with DistilBERT
Sentiment analysis with Hugging Face’s models’ list
DistilBERT for SST
MiniLM-L12-H384-uncased
RoBERTa-large-mnli
BERT-base multilingual model
Sentiment analysis with GPT-3
Some Pragmatic I4.0 thinking before we leave
Investigating with SRL
Investigating with Hugging Face
Investigating with the GPT-3 playground
GPT-3 code
Summary
Questions
References
Chapter 13: Analyzing Fake News with Transformers
Emotional reactions to fake news
Cognitive dissonance triggers emotional reactions
Analyzing a conflictual Tweet
Behavioral representation of fake news
A rational approach to fake news
Defining a fake news resolution roadmap
The gun control debate
Sentiment analysis
Named entity recognition (NER)
Semantic Role Labeling (SRL)
Gun control SRL
Reference sites
COVID-19 and former President Trump’s Tweets
Semantic Role Labeling (SRL)
Before we go
Summary
Questions
References
Chapter 14: Interpreting Black Box Transformer Models
Transformer visualization with BertViz
Running BertViz
Step 1: Installing BertViz and importing the modules
Step 2: Load the models and retrieve attention
Step 3: Head view
Step 4: Processing and displaying attention heads
Step 5: Model view
LIT
PCA
Running LIT
Transformer visualization via dictionary learning
Transformer factors
Introducing LIME
The visualization interface
Exploring models we cannot access
Summary
Questions
References
Chapter 15: From NLP to Task-Agnostic Transformer Models
Choosing a model and an ecosystem
The Reformer
Running an example
DeBERTa
Running an example
From Task-Agnostic Models to Vision Transformers
ViT – Vision Transformers
The Basic Architecture of ViT
Vision transformers in code
CLIP
The Basic Architecture of CLIP
CLIP in code
DALL-E
The Basic Architecture of DALL-E
DALL-E in code
An expanding universe of models
Summary
Questions
References
Chapter 16: The Emergence of Transformer-Driven Copilots
Prompt engineering
Casual English with a meaningful context
Casual English with a metonymy
Casual English with an ellipsis
Casual English with vague context
Casual English with sensors
Casual English with sensors but no visible context
Formal English conversation with no context
Prompt engineering training
Copilots
GitHub Copilot
Codex
Domain-specific GPT-3 engines
Embedding2ML
Step 1: Installing and importing OpenAI
Step 2: Loading the dataset
Step 3: Combining the columns
Step 4: Running the GPT-3 embedding
Step 5: Clustering (k-means clustering) with the embeddings
Step 6: Visualizing the clusters (t-SNE)
Instruct series
Content filter
Transformer-based recommender systems
General-purpose sequences
Dataset pipeline simulation with RL using an MDP
Training customer behaviors with an MDP
Simulating consumer behavior with an MDP
Making recommendations
Computer vision
Humans and AI copilots in metaverses
From looking at to being in
Summary
Questions
References
Appendix I — Terminology of Transformer Models
Stack
Sublayer
Attention heads
Appendix II — Hardware Constraints for Transformer Models
The Architecture and Scale of Transformers
Why GPUs are so special
GPUs are designed for parallel computing
GPUs are also designed for matrix multiplication
Implementing GPUs in code
Testing GPUs with Google Colab
Google Colab Free with a CPU
Google Colab Free with a GPU
Google Colab Pro with a GPU
Appendix III — Generic Text Completion with GPT-2
Step 1: Activating the GPU
Step 2: Cloning the OpenAI GPT-2 repository
Step 3: Installing the requirements
Step 4: Checking the version of TensorFlow
Step 5: Downloading the 345M-parameter GPT-2 model
Steps 6-7: Intermediate instructions
Steps 7b-8: Importing and defining the model
Step 9: Interacting with GPT-2
References
Appendix IV — Custom Text Completion with GPT-2
Training a GPT-2 language model
Step 1: Prerequisites
Steps 2 to 6: Initial steps of the training process
Step 7: The N Shepperd training files
Step 8: Encoding the dataset
Step 9: Training a GPT-2 model
Step 10: Creating a training model directory
Step 11: Generating unconditional samples
Step 12: Interactive context and completion examples
References
Appendix V — Answers to the Questions
Chapter 1, What are Transformers?
Chapter 2, Getting Started with the Architecture of the Transformer Model
Chapter 3, Fine-Tuning BERT Models
Chapter 4, Pretraining a RoBERTa Model from Scratch
Chapter 5, Downstream NLP Tasks with Transformers
Chapter 6, Machine Translation with the Transformer
Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines
Chapter 8, Applying Transformers to Legal and Financial Documents for AI Text Summarization
Chapter 9, Matching Tokenizers and Datasets
Chapter 10, Semantic Role Labeling with BERT-Based Transformers
Chapter 11, Let Your Data Do the Talking: Story, Questions, and Answers
Chapter 12, Detecting Customer Emotions to Make Predictions
Chapter 13, Analyzing Fake News with Transformers
Chapter 14, Interpreting Black Box Transformer Models
Chapter 15, From NLP to Task-Agnostic Transformer Models
Chapter 16, The Emergence of Transformer-Driven Copilots
Other Books You May Enjoy
Index

📜 SIMILAR VOLUMES

Transfer Learning for Natural Language P

📁 Transfer Learning for Natural Language Processing

✍ Paul Azunre 📂 Library 📅 2021 🏛 Manning Publications 🌐 English

Transfer Learning for Natural Language Processing gets you up to speed with the relevant ML concepts before diving into the cutting-edge advances that are defining the future of NLP.Building and training deep learning models from scratch is costly, time-consuming, and requires massive amounts of dat

Transfer Learning for Natural Language P

📁 Transfer Learning for Natural Language Processing

✍ Paul Azunre 📂 Library 📅 2021 🏛 Manning 🌐 English

<b>Build custom NLP models in record time by adapting pre-trained machine learning models to solve specialized problems.</b> Summary In <i>Transfer Learning for Natural Language Processing</i> you will learn: Fine tuning pretrained models with new domain data Picking the right mod

Transformers for Natural Language Proces

📁 Transformers for Natural Language Processing and Computer Vision

✍ Denis Rothman 📂 Library 📅 2024 🏛 Packt 🌐 English

This book provides a comprehensive guide to leveraging the immense potential of transformers for NLP and vision tasks. It covers the architectural innovations that have led to unprecedented natural language capabilities, along with the associated risks and mitigation strategies.

Natural Language Processing with Transfo

📁 Natural Language Processing with Transformers

✍ Lewis Tunstall, Leandro von Werra, Thomas Wolf 📂 Library 📅 2021 🏛 O'Reilly Media, Inc. 🌐 English

Since their introduction in 2017, Transformers have quickly become the dominant architecture for achieving state-of-the-art results on a variety of natural language processing tasks. If you're a data scientist or machine learning engineer, this practical book shows you how to train and scale these l

Natural Language Processing with Transfo

📁 Natural Language Processing with Transformers

✍ Lewis Tunstall, Leandro von Werra, Thomas Wolf 📂 Library 📅 2021 🏛 O'Reilly Media, Inc. 🌐 English

Natural Language Processing with Transfo

📁 Natural Language Processing with Transformers

✍ Tunstall, Lewis; Werra, Leandro von; Wolf, Thomas 📂 Library 📅 2022 🏛 O'Reilly Media 🌐 English