Transformers in Action (MEAP v7) 2024

✍ Scribed by Nicole Koenigstein

Publisher: Manning Publications Co.
Year: 2024
Tongue: English
Leaves: 272
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Transformers are the superpower behind large language models (LLMs) like ChatGPT, Bard, and LLAMA. Transformers in Action gives you the insights, practical techniques, and extensive code samples you need to adapt pretrained transformer models to new and exciting tasks.

Inside Transformers in Action you’ll learn:
How transformers and LLMs work
Adapt HuggingFace models to new tasks
Automate hyperparameter search with Ray Tune and Optuna
Optimize LLM model performance
Advanced prompting and zero/few-shot learning
Text generation with reinforcement learning
Responsible LLMs

Technically speaking, a “Transformer” is a neural network model that finds relationships in sequences of words or other data by using a mathematical technique called attention in its encoder/decoder components. This setup allows a transformer model to learn context and meaning from even long sequences of text, thus creating much more natural responses and predictions. Understanding the transformers architecture is the key to unlocking the power of LLMs for your own AI applications.

This comprehensive guide takes you from the origins of transformers all the way to fine-tuning an LLM for your own projects. Author Nicole Königstein demonstrates the vital mathematical and theoretical background of the transformer architecture practically through executable Jupyter notebooks, illuminating how this amazing technology works in action

✦ Table of Contents

Transformers in Action MEAP V07
Copyright
Welcome
Brief contents
Part 1: Introduction to transformers
Chapter 1: The need for transformers
1.1 The transformers breakthrough
1.1.1 Unveiling the attention mechanism
1.1.2 The power of multi-head attention
1.2 How to use transformers
1.3 When and why you'd want to use transformers
1.4 Summary
Bibliography
Chapter 2: A deeper look into transformers
2.1 From seq-2-seq models to transformers
2.1.1 The difficulty of training RNNs
2.1.2 Vanishing gradients: transformer to the rescue
2.2 Model architecture
2.2.1 Encoder and decoder stacks
2.2.2 Attention
2.2.3 Position-wise feed-forward networks
2.2.4 Positional encoding
2.3 Building on the basics: a world of possibilities awaits!
2.3.1 Methods to stabilize the training of RNNs
2.3.2 The transformer architecture: a paradigm shift in neural network stability
2.4 Summary
Bibliography
Part 2: Transformers for Fundamental NLP Tasks
Chapter 3: Text summarization
3.1 Getting started with text summarization
3.1.1 Extractive text summarization
3.1.2 Text summarization techniques
3.1.3 Establishing a baseline: TextRank
3.1.4 Abstractive text summarization
3.1.5 Pointer-generator networks
3.2 Text-to-text transformer models
3.3 Model overview
3.3.1 BART
3.3.2 T5
3.3.3 ProphetNet
3.3.4 Pegasus
3.3.5 Longformer
3.3.6 BigBird
3.4 Metrics to evaluate generated text
3.4.1 ROUGE
3.4.2 BLEU
3.5 Applications and worked examples
3.5.1 Evaluating different summarization models
3.6 Fine-tuning a summarization model
3.6.1 Utilizing the model.config function
3.6.2 Data pre-processing and subset selection
3.6.3 Using the Hugging-Face Trainer class
3.7 Summary
Bibliography
Chapter 4: Machine translation
4.1 Introduction to machine translation
4.1.1 The Vauquois triangle
4.2 Machine Translation approaches
4.2.1 Rule-based machine translation
4.2.2 Example-based machine translation
4.2.3 Statistical machine translation
4.2.4 Neural Machine Translation
4.3 State-of-the-art machine translation models
4.3.1 mBART
4.3.2 mBART-50
4.3.3 XLM
4.3.4 XLM-RoBERTa
4.3.5 M-BERT
4.3.6 mT5
4.4 Common techniques and challenges in machine translation
4.4.1 Benefits of pretraining in NMT and common pretraining techniques
4.4.2 Dealing with language-related challenges
4.5 Applications and worked examples
4.5.1 METEOR as evaluation metric
4.5.2 Generating translations
4.5.3 Generating German summaries with mBART
4.6 Summary
Bibliography
Chapter 5: Text classification
5.1 Introduction to text classification
5.1.1 Establishing a baseline for text classification: Naïve Bayes classifier
5.2 Transformers in text classification: an overview
5.2.1 BERT
5.2.2 RoBERTa
5.2.3 ALBERT
5.2.4 DistilBERT
5.2.5 DeBERTa
5.2.6 ELECTRA
5.3 Evaluating classification performance
5.3.1 Confusion matrix
5.3.2 Accuracy
5.3.3 F1-score
5.4 Applications and worked examples
5.4.1 Fine-tuning different classification models on the Financial Phrasebank dataset
5.4.2 Fine-tuning a classification model on the AG_News Dataset
5.4.3 Fine-tuning a classification model on the Yelp Dataset
5.5 Summary
Bibliography
Bibliography
Part 3: Advanced models and methods
Chapter 6: Text generation
6.1 Introduction to text generation
6.1.1 From rule-based chatbots to Turing Test passing bots
6.2 Transformers in text generation: An overview
6.2.1 GPT-1 to GPT-3
6.2.2 InstructGPT
6.2.3 GPT-NeoX-20B
6.2.4 Llama
6.2.5 RedPajama
6.2.6 Alpaca
6.2.7 Dolly
6.2.8 Falcon
6.3 Common techniques in text generation
6.3.1 Contextual word embeddings
6.3.2 Greedy Search decoding for text generation
6.3.3 Beam search decoding for text generation
6.3.4 Top-k sampling for Text Generation
6.3.5 Nucleus sampling for text generation
6.3.6 Temperature Sampling for Text Generation
6.4 Challenges in transformer-based text generation
6.4.1 High quality training data
6.4.2 Hallucination
6.5 Summary
Bibliography
Chapter 7: Controlling generated text
7.1 Improving LLMs with reinforcement learning from human feedback
7.1.1 From Markov decision processes to reinforcement learning
7.1.2 Improving models with human feedback and reinforcement learning
7.2 Aligning LLMs with Direct Preference Optimization
7.3 Prompt engineering: The art of prompting
7.3.1 Zero-shot prompting
7.3.2 One- and few-shot prompting
7.3.3 Chain-of-Thought prompting
7.3.4 Contrastive Chain-of-Thought Prompting
7.3.5 Tree of Thought prompting
7.3.6 Thread of Thought prompting
7.4 Summary
Chapter 8: Multimodal models
8.1 Getting started with multimodal models
8.2 Challenges and considerations for multimodal models
8.2.1 Perceiver-based multimodal methods
8.2.2 Converter-based multimodal methods
8.3 Model Overview
8.3.1 BLIP
8.3.2 BLIP-2
8.3.3 CLIP
8.3.4 X-CLIP
8.3.5 Flamingo
8.3.6 OpenFlamingo
8.3.7 GPT-4 with vision
8.3.8 LLaVA
8.4 Applications and worked examples
8.4.1 Comparison of different MLLMs for visual reasoning and chat capabilities
8.5 Summary
Bibliography
Chapter 9: Optimize and evaluate large language models
9.1 Deep dive into hyperparameters
9.1.1 How parameters and hyperparameters factor into gradient descent
9.2 Model tuning and hyperparameter optimization
9.2.1 Track experiments
9.3 Techniques for model optimization
9.3.1 Model Pruning
9.3.2 Model Distillation
9.4 Parameter efficient fine-tuning LLMs
9.4.1 Low-rank adaptation
9.4.2 Weight-decomposed low-rank adaptation
9.4.3 Quantization
9.4.4 Efficient fine-tuning of quantized LLMs with QLoRA
9.4.5 Quantization-aware low-rank adaptation
9.4.6 Low-rank plus quantized matrix decomposition
9.5 Sharding LLMs for memory optimization
9.6 Summary
Bibliography
A Get the most out of this book - how to run the code

📜 SIMILAR VOLUMES

F# in Action (MEAP v7)

📁 F# in Action (MEAP v7)

✍ Isaac Abraham 📂 Library 📅 2023 🏛 Manning Publications 🌐 English

F# is designed to make functional programming practical and accessible, especially for developers working on the .NET platform. This book will get you started. In F# in Action you will learn how to Write performant and robust systems with succinct F# code Model domains quickly, easily and accur

Bayesian Optimization in Action (MEAP V7

📁 Bayesian Optimization in Action (MEAP V7)

✍ Quan Nguyen 📂 Library 📅 2022 🏛 Manning 🌐 English

Apply advanced techniques for optimizing machine learning processes. Bayesian optimization helps pinpoint the best configuration for your machine learning models with speed and accuracy. In Bayesian Optimization in Action you will learn how to Train Gaussian processes on both sparse and large

Bayesian Optimization in Action (MEAP V7

📁 Bayesian Optimization in Action (MEAP V7)

✍ Quan Nguyen 📂 Library 📅 2022 🏛 Manning 🌐 English

Elixir in Action, Third Edition (MEAP v7

📁 Elixir in Action, Third Edition (MEAP v7)

✍ Sasa Juric 📂 Library 📅 2023 🏛 Manning Publications 🌐 English

Fully updated to Elixir 1.14, this authoritative bestseller reveals how Elixir tackles problems of scalability, fault tolerance, and high availability. Inside Elixir in Action, Third Edition you’ll find Updates for Elixir 1.14 Elixir modules, functions, and type system Functional and concurren

GitHub Actions in Action (MEAP V03)

📁 GitHub Actions in Action (MEAP V03)

✍ Michael Kaufmann, Rob Bos, Marcel de Vries 📂 Library 📅 2024 🏛 Manning Publications Co. 🌐 English

GitHub Actions in Action shows you exactly how to implement a secure and reliable continuous delivery process with just the tools available in GitHub—no complex CI/CD frameworks required! You’ll follow an extended example application for selling tickets, taking it all the way from initial build to c

Quarkus in Action (MEAP V03)

📁 Quarkus in Action (MEAP V03)

✍ Martin Štefanko, Jan Martiška 📂 Library 📅 2023 🏛 Manning Publications 🌐 English

In Quarkus in Action, you will • Use Quarkus Dev mode to speed up and enhance Java development • Understand how to use the Dev UI to observe and troubleshoot running applications • Automatic background testing using the Continuous Testing feature • New frameworks and libraries such as Reactive M