Build advanced Natural Language Understanding Systems by acquiring data and selecting appropriate technology. Key Features Master NLU concepts from basic text processing to advanced deep learning techniques Explore practical NLU applications like chatbots, sentiment analysis, and language trans
Large Language Models: A Deep Dive
β Scribed by Uday Kamath, Kevin Keenan, Garrett Somers, Sarah Sorenson
- Publisher
- Springer
- Year
- 2024
- Tongue
- English
- Leaves
- 496
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Large Language Models (LLMs) have emerged as a cornerstone technology, transforming how we interact with information and redefining the boundaries of artificial intelligence. LLMs offer an unprecedented ability to understand, generate, and interact with human language in an intuitive and insightful manner, leading to transformative applications across domains like content creation, chatbots, search engines, and research tools. While fascinating, the complex workings of LLMsβtheir intricate architecture, underlying algorithms, and ethical considerationsβrequire thorough exploration, creating a need for a comprehensive book on this subject.
This book provides an authoritative exploration of the design, training, evolution, and application of LLMs. It begins with an overview of pre-trained language models and Transformer architectures, laying the groundwork for understanding prompt-based learning techniques. Next, it dives into methods for fine-tuning LLMs, integrating reinforcement learning for value alignment, and the convergence of LLMs with computer vision, robotics, and speech processing. The book strongly emphasizes practical applications, detailing real-world use cases such as conversational chatbots, retrieval-augmented generation (RAG), and code generation. These examples are carefully chosen to illustrate the diverse and impactful ways LLMs are being applied in various industries and scenarios.
β¦ Table of Contents
Foreword
Reviews
Preface
Why This Book
Who This Book Is For
What This Book Covers
How to Navigate This Book
Acknowledgments
Declarations
Notation
Contents
Selected Acronyms
Chapter 1 Large Language Models: An Introduction
1.1 Introduction
1.2 Natural Language
1.3 NLP and Language Models Evolution
1.3.1 Syntactic and Grammar-based methods: 1960s-1980s
1.3.2 Expert Systems and Statistical Models: 1980s-2000s
1.3.3 Neural Models and Dense Representations: 2000s-2010s
1.3.4 The Deep Learning Revolution: 2010s-2020s
1.4 The Era of Large Language Models
1.4.1 A Brief History of LLM Evolution
1.4.2 LLM Scale
1.4.3 Emergent Abilities in LLMs
1.5 Large Language Models in Practice
1.5.1 Large Language Model Development
1.5.2 Large Language Model Adaptation
1.5.3 Large Language Model Utilization
References
Chapter 2 Language Models Pre-training
2.1 Encoder-Decoder Architecture
2.1.1 Encoder
2.1.2 Decoder
2.1.3 Training and Optimization
2.1.3 Issues with Encoder-Decoder Architectures
2.2 Attention Mechanism
2.2.1 Self-AttentionSelf-Attention
2.3 Transformers
2.3.1 EncoderEncoder
2.3.2 EncoderDecoder
2.3.3 EncoderTokenization and Representation
2.3.4 EncoderPositional Encodings
2.3.5 EncoderMulti-Head Attention
2.3.6 EncoderPosition-Wise Feed-Forward Neural Networks
2.3.7 EncoderLayer Normalization
2.3.8 EncoderMasked Multi-Head Attention
2.3.9 EncoderEncoder-Decoder Attention
2.3.10 EncoderTransformer Variants
2.4 Data
2.4.1 Language Model Pre-Training Datasets
2.4.2 Data Pre-Processing
2.4.3 Effects of Data on LLMs
2.4.4 Task-Specific Datasets
2.5 Pre-trained LLM Design Choices
2.5.1 Pre-Training Methods
2.5.2 Pre-training Tasks
2.5.3 Architectures
2.5.4 LLM Pre-training Tips and Strategies
2.6 Commonly Used Pre-trained LLMs
2.6.1 BERT (Encoder)
2.6.2 T5 (Encoder-Decoder)
2.6.3 GPT (Decoder)
2.6.4 Mixtral 8x7B (Mixture of Experts)
2.7 Tutorial: Understanding LLMs and Pre-training
2.7.1 OverviewOverview
2.7.2 Experimental Design
2.7.3 Results and Analysis
2.7.4 Conclusion
References
Chapter 3 Prompt-based Learning
3.1 Introduction
3.1.1 Fully Supervised Learning
3.1.2 Pre-train and Fine-tune Learning
3.1.3 Prompt-based Learning
3.2 Basics of Prompt-based Learning
3.2.1 Prompt-based Learning: Formal Description
3.2.2 Prompt-based Learning Process
3.2.3 Prompt-based Knowledge Extraction
3.2.4 Prompt-based Learning Across NLP Tasks
3.3 Prompt Engineering
3.3.1 Prompt Shape
3.3.2 Manual Template Design
3.3.3 Automated Template Design: Discrete Search
3.3.4 Automated Template Design: Continuous Search
3.3.5 Prompt-based Fine-tuning
3.4 Answer engineering
3.4.1 Answer Shape
3.4.2 Defining the Answer Space
3.4.3 Manual Answer Mapping
3.4.4 Automated Answer Mapping: Discrete Search
3.4.5 Automated Answer Mapping: Continuous Search
3.5 Multi-Prompt Inference
3.5.1 Ensembling
3.5.2 In-context Learning
3.5.3 Prompt Decomposition
3.6 First Tutorial: Prompt vs. Pre-train and Fine-tune Methods in Text Classification and NER
3.6.1 Overview
3.6.2 Experimental Design
3.6.3 Results and Analysis
3.6.4 Conclusion
3.7 Second Tutorial: Approaches to Prompt Engineering
3.7.1 Overview
3.7.2 Experimental Design
3.7.3 Results and Analysis
3.7.4 Conclusion
References
Chapter 4 LLM Adaptation and Utilization
4.1 Introduction
4.2 Instruction Tuning
4.2.1 Instruction Tuning Procedure
4.2.2 Instruction Tuning Data
4.2.3 Instruction Tuning for Domain Adaptation
4.3 Parameter-Efficient Fine-Tuning
4.3.1 Adapters
4.3.2 Reparameterization
4.4 Compute-Efficient Fine-Tuning
4.4.1 LLM Quantization
4.5 End-User Prompting
4.5.1 Zero-Shot Prompting
4.5.2 Few-Shot Prompting
4.5.3 Prompt Chaining
4.5.4 Chain-of-Thought
4.5.5 Self-Consistency
4.5.6 Tree-of-Thoughts
4.6 Tutorial: Fine-Tuning LLMs in a Resource-Constrained Setting
4.6.1 Overview
4.6.2 Experimental Design
4.6.3 Results and Analysis
4.6.4 Conclusion
References
Chapter 5 Tuning for LLM Alignment
5.1 Alignment Tuning
5.1.1 Helpfulness
5.1.2 Honesty
5.1.3 Harmlessness
5.2 Foundation: The Reinforcement Learning Framework
5.3 Mapping the RL Framework to LLMs with Human Feedback
5.4 Evolution of RLHF
5.4.1 Safety, Quality, and Groundedness in LLMs
5.4.2 Deep Reinforcement Learning from Human Preferences
5.4.3 Learning Summarization from Human Feedback
5.4.4 Aligning LLMs to be Helpful, Honest, and Harmless with Human Feedback
5.5 Overcoming RLHF Challenges
5.5.1 Instilling Harmlessness with AI Feedback
5.5.2 Direct Preference Optimization
5.6 Tutorial: Making a Language Model More Helpful with RLHF
5.6.1 Overview
5.6.2 Experimental Design
5.6.3 Results and Analysis
5.6.4 Conclusion
References
Chapter 6 LLM Challenges and Solutions
6.1 Hallucination
6.1.1 Causes
6.1.2 Evaluation Metrics
6.1.3 Benchmarks
6.1.4 Mitigation Strategies
6.2 Bias and Fairness
6.2.1 Representational Harms
6.2.2 Allocational Harms
6.2.3 Causes
6.2.4 Evaluation Metrics
6.2.5 Benchmarks
6.2.6 Mitigation Strategies
6.3 Toxicity
6.3.1 Causes
6.3.2 Evaluation Metrics
6.3.3 Benchmarks
6.3.4 Mitigation Strategies
6.4 Privacy
6.4.1 Causes
6.4.2 Evaluation Metrics
6.4.3 Benchmarks
6.4.4 Mitigation Strategies
6.5 Tutorial: Measuring and Mitigating Bias in LLMs
6.5.1 Overview
6.5.2 Experimental Design
6.5.3 Results and Analysis
6.5.4 Conclusion
References
Chapter 7 Retrieval-Augmented Generation
7.1 Introduction
7.2 Basics of RAG
7.3 Optimizing RAG
7.4 Enhancing RAG
7.4.1 Data Sources and Embeddings
7.4.2 Querying
7.4.3 Retrieval and Generation
7.4.4 Summary
7.5 Evaluating RAG Applications
7.5.1 RAG Quality Metrics
7.5.2 Evaluation of RAG System Capabilities
7.5.3 Summarizing RAG Evaluation
7.6 Tutorial: Building Your Own Retrieval-Augmented Generation System
7.6.1 Overview
7.6.2 Experimental Design
7.6.3 Results and Analysis
7.6.4 Conclusion
References
Chapter 8 LLMs in Production
8.1 Introduction
8.2 LLM Applications
8.2.1 Conversational AI, chatbots and AI assistants
8.2.2 Content Creation
8.2.3 Search, Information Retrieval, and Recommendation Systems
8.2.4 Coding
8.2.5 Categories of LLMs
8.3 LLM Evaluation Metrics
8.3.1 Perplexity
8.3.2 BLEU
8.3.3 ROUGE
8.3.4 BERTScore
8.3.5 MoverScore
8.3.6 G-Eval
8.3.7 Pass@k
8.4 LLM Benchmark Datasets
8.5 LLM Selection
8.5.1 Open Source vs. Closed Source
8.5.2 Analytic Quality
8.5.3 Inference Latency
8.5.4 Costs
8.5.5 Adaptability and Maintenance
8.5.6 Data Security and Licensing
8.6 Tooling for Application Development
8.6.1 LLM Application Frameworks
8.6.2 LLM Customization
8.6.3 Vector Databases
8.6.4 Prompt Engineering
8.6.5 Evaluation and Testing
8.7 Inference
8.7.1 Model Hosting
8.7.2 Optimizing Performance
8.7.3 Optimizing Cost
8.8 LLMOps
8.8.1 LLMOps Tools and Methods
8.8.2 Accelerating the Iteration Cycle
8.8.3 Risk Management
8.9 Tutorial: Preparing Experimental Models for Production Deployment
8.9.1 Overview
8.9.2 Experimental Design
8.9.3 Results and Analysis
8.9.4 Conclusion
References
Chapter 9 Multimodal LLMs
9.1 Introduction
9.2 Brief History
9.3 Multimodal LLM Framework
Modality Encoder
Input Projector
Pre-training: Core LLMs, Datasets and Task-Specific Objectives
MMLLM Tuning and Enhancements
Multimodal RLHF
Output Projector
Modality Generator
9.4 Benchmarks
9.5 State-of-the-Art MMLLMs
Flamingo (Image-Video-Text)
Video-LLaMA (Image-Video-Audio-Text)
NExT-GPT (Any-to-Any)
9.6 Tutorial: Fine-Tuning Multimodal Image-to-Text LLMs
Overview
Experimental Design
Results and Analysis
Conclusion
References
Chapter 10 LLMs: Evolution and New Frontiers
10.1 Introduction
10.2 LLM Evolution
10.2.1 Synthetic Data
10.2.2 Larger Context Windows
10.2.3 Training Speedups
10.2.4 Multi-Token Generation
10.2.5 Knowledge Distillation
10.2.6 Post-Attention Architectures
10.3 LLM Trends
10.3.1 Small Language Models
10.3.2 Democratization
10.3.3 Domain-Specific Language Models
10.4 New Frontiers
10.4.1 LLM Agents
10.4.2 LLM-Enhanced Search
10.5 Closing Remarks
References
Appendix A Deep Learning Basics
A.1 Basic Structure of Neural Networks
A.2 Perceptron
A.3 Multilayer Perceptron
A.3.1 Structure and Function of MLPs
A.3.2 Training MLPs
A.4 Deep Learning
A.4.1 Key Components of Deep Neural Networks
A.4.2 Activation Functions
A.4.3 Loss Functions
A.4.4 Optimization Techniques
A.4.5 Model Training
A.4.6 Regularization Techniques
Appendix B Reinforcement Learning Basics
B.1 Markov Decision Process
B.1.1 Tasks
B.1.2 Rewards and Return
B.1.3 Policies and Value Functions
B.1.4 Optimality
B.2 Exploration/Exploitation Trade-off
B.3 Reinforcement Learning Algorithms
B.3.1 Q-Learning
B.3.2 Deep Q-Network (DQN)
B.3.3 Policy Gradient-based Methods
Index
π SIMILAR VOLUMES
Learn how to create, train, and tweak large language models (LLMs) by building one from the ground up! In Build a Large Language Model (from Scratch), youβll discover how LLMs work from the inside out. In this insightful book, bestselling author Sebastian Raschka guides you step by step through cre
In this book, I invite you to embark on an educational journey with me to learn how to build Large Language Models (LLMs) from the ground up. Together, we'll delve deep into the LLM training pipeline, starting from data loading and culminating in finetuning LLMs on custom datasets. For many year
Gain a theoretical understanding to the most popular libraries in deep reinforcement learning (deep RL). This new edition focuses on the latest advances in deep RL using a learn-by-coding approach, allowing readers to assimilate and replicate the latest research in this field. New agent environme
Deep Reinforcement Learning with Python, Second Edition Gain a theoretical understanding to the most popular libraries in deep reinforcement learning (deep RL). This new edition focuses on the latest advances in deep RL using a learn-by-coding approach, allowing readers to assimilate and replicat
Deep Reinforcement Learning with Python, Second Edition Gain a theoretical understanding to the most popular libraries in deep reinforcement learning (deep RL). This new edition focuses on the latest advances in deep RL using a learn-by-coding approach, allowing readers to assimilate and replicat