𝔖 Scriptorium
✦   LIBER   ✦

📁

Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications (Tech Today)

✍ Scribed by Shreyas Subramanian


Publisher
Wiley
Year
2024
Tongue
English
Leaves
221
Edition
1
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Learn to build cost-effective apps using Large Language Models

In Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications, Principal Data Scientist at Amazon Web Services, Shreyas Subramanian, delivers a practical guide for developers and data scientists who wish to build and deploy cost-effective large language model (LLM)-based solutions. In the book, you'll find coverage of a wide range of key topics, including how to select a model, pre- and post-processing of data, prompt engineering, and instruction fine tuning.

The author sheds light on techniques for optimizing inference, like model quantization and pruning, as well as different and affordable architectures for typical generative AI (GenAI) applications, including search systems, agent assists, and autonomous agents. You'll also find:

  • Effective strategies to address the challenge of the high computational cost associated with LLMs
  • Assistance with the complexities of building and deploying affordable generative AI apps, including tuning and inference techniques
  • Selection criteria for choosing a model, with particular consideration given to compact, nimble, and domain-specific models

Perfect for developers and data scientists interested in deploying foundational models, or business leaders planning to scale out their use of GenAI, Large Language Model-Based Solutions will also benefit project leaders and managers, technical support staff, and administrators with an interest or stake in the subject.

✦ Table of Contents


Cover
Contents At A Glance
Title Page
Copyright Page
Dedication Page
About the Author
About the Technical Editor
Contents
Introduction
GenAI Applications and Large Language Models
Importance of Cost Optimization
Challenges and Opportunities
Micro Case Studies
OpenAI: Leading the Way
Hugging Face: Open-Source Community Building
Bloomberg GPT: LLMs in Large Commercial Institutions
Who Is This Book For?
Summary
Chapter 1 Introduction
Overview of GenAI Applications and Large Language Models
The Rise of Large Language Models
Neural Networks, Transformers, and Beyond
GenAI vs. LLMs: What’s the Difference?
The Three-Layer GenAI Application Stack
The Infrastructure Layer
The Model Layer
The Application Layer
Paths to Productionizing GenAI Applications
Sample LLM-Powered Chat Application
The Importance of Cost Optimization
Cost Assessment of the Model Inference Component
Cost Assessment of the Vector Database Component
Benchmarking Setup and Results
Other Factors to Consider
Cost Assessment of the Large Language Model Component
Summary
Chapter 2 Tuning Techniques for Cost Optimization
Fine-Tuning and Customizability
Basic Scaling Laws You Should Know
Parameter-Efficient Fine-Tuning Methods
Adapters Under the Hood
Prompt Tuning
Prefix Tuning
P-tuning
IA3
Low-Rank Adaptation
Cost and Performance Implications of PEFT Methods
Summary
Chapter 3 Inference Techniques for Cost Optimization
Introduction to Inference Techniques
Prompt Engineering
Impact of Prompt Engineering on Cost
Estimating Costs for Other Models
Clear and Direct Prompts
Adding Qualifying Words for Brief Responses
Breaking Down the Request
Example of Using Claude for PII Removal
Conclusion
Providing Context
Examples of Providing Context
RAG and Long Context Models
Recent Work Comparing RAG with Long Content Models
Conclusion
Context and Model Limitations
Indicating a Desired Format
Example of Formatted Extraction with Claude
Trade-Off Between Verbosity and Clarity
Caching with Vector Stores
What Is a Vector Store?
How to Implement Caching Using Vector Stores
Conclusion
Chains for Long Documents
What Is Chaining?
Implementing Chains
Example Use Case
Common Components
Tools That Implement Chains
Comparing Results
Conclusion
Summarization
Summarization in the Context of Cost and Performance
Efficiency in Data Processing
Cost-Effective Storage
Enhanced Downstream Applications
Improved Cache Utilization
Summarization as a Preprocessing Step
Enhanced User Experience
Conclusion
Batch Prompting for Efficient Inference
Batch Inference
Experimental Results
Using the accelerate Library
Using the DeepSpeed Library
Batch Prompting
Example of Using Batch Prompting
Model Optimization Methods
Quantization
Code Example
Recent Advancements: GPTQ
Parameter-Efficient Fine-Tuning Methods
Recap of PEFT Methods
Code Example
Cost and Performance Implications
Summary
References
Chapter 4 Model Selection and Alternatives
Introduction to Model Selection
Motivating Example: The Tale of Two Models
The Role of Compact and Nimble Models
Examples of Successful Smaller Models
Quantization for Powerful but Smaller Models
Text Generation with Mistral 7B
Zephyr 7B and Aligned Smaller Models
CogVLM for Language-Vision Multimodality
Prometheus for Fine-Grained Text Evaluation
Orca 2 and Teaching Smaller Models to Reason
Breaking Traditional Scaling Laws with Gemini and Phi
Phi 1, 1.5, and 2 B Models
Gemini Models
Domain-Specific Models
Step 1 - Training Your Own Tokenizer
Step 2 - Training Your Own Domain-Specific Model
More References for Fine-Tuning
Evaluating Domain-Specific Models vs. Generic Models
The Power of Prompting with General-Purpose Models
Summary
Chapter 5 Infrastructure and Deployment Tuning Strategies
Introduction to Tuning Strategies
Hardware Utilization and Batch Tuning
Memory Occupancy
Strategies to Fit Larger Models in Memory
KV Caching
PagedAttention
How Does PagedAttention Work?
Comparisons, Limitations, and Cost Considerations
AlphaServe
How Does AlphaServe Work?
Impact of Batching
Cost and Performance Considerations
S3: Scheduling Sequences with Speculation
How Does S3 Work?
Performance and Cost
Streaming LLMs with Attention Sinks
Fixed to Sliding Window Attention
Extending the Context Length
Working with Infinite Length Context
How Does StreamingLLM Work?
Performance and Results
Cost Considerations
Batch Size Tuning
Frameworks for Deployment Configuration Testing
Cloud-NativeInference Frameworks
Deep Dive into Serving Stack Choices
Batching Options
Options in DJL Serving
High-Level Guidance for Selecting Serving Parameters
Automatically Finding Good Inference Configurations
Creating a Generic Template
Defining a HPO Space
Searching the Space for Optimal Configurations
Results of Inference HPO
Inference Acceleration Tools
TensorRT and GPU Acceleration Tools
CPU Acceleration Tools
Monitoring and Observability
LLMOps and Monitoring
Why Is Monitoring Important for LLMs?
Monitoring and Updating Guardrails
Summary
Conclusion
Index
EULA


📜 SIMILAR VOLUMES


Large Language Model-Based Solutions : H
✍ Shreyas Subramanian 📂 Library 📅 2024 🏛 WILEY 🌐 English

<p><b>Learn to build cost-effective apps using Large Language Models</b> <p>In <i>Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications</i>, Principal Data Scientist at Amazon Web Services, Shreyas Subramanian, delivers a practical guide for develo

Large Language Model-Based Solutions: Ho
✍ Shreyas Subramanian 📂 Library 📅 2024 🏛 WILEY 🌐 English

Learn to build cost-effective apps using Large Language Models InLarge Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications, Principal Data Scientist at Amazon Web Services, Shreyas Subramanian, delivers a practical guide for developers and data scient

Learn Python Generative AI: Journey from
✍ Zonunfeli Ralte, Indrajit Kar 📂 Library 📅 2024 🏛 BPB Publications 🌐 English

Learn to unleash the power of AI creativity KEY FEATURES ● Understand the core concepts related to generative AI. ● Different types of generative models and their applications. ● Learn how to design generative AI neural networks using Python and TensorFlow. DESCRIPTION This book researches the intri

Learn Python Generative AI: Journey from
✍ Zonunfeli Ralte, Indrajit Kar 📂 Library 📅 2024 🏛 BPB Publications 🌐 English

Learn to unleash the power of AI creativity KEY FEATURES ● Understand the core concepts related to generative AI. ● Different types of generative models and their applications. ● Learn how to design generative AI neural networks using Python and TensorFlow. DESCRIPTION This book researches the intri

Productionizing AI: How to Deliver AI B2
✍ Barry Walsh 📂 Library 📅 2022 🏛 Apress 🌐 English

<p><span>This book is a guide to productionizing AI solutions using best-of-breed cloud services with workarounds to lower costs. Supplemented with step-by-step instructions covering data import through wrangling to partitioning and modeling through to inference and deployment, and augmented with pl

Productionizing AI: How to Deliver AI B2
✍ Barry Walsh 📂 Library 📅 2022 🏛 Apress 🌐 English

<p><span>This book is a guide to productionizing AI solutions using best-of-breed cloud services with workarounds to lower costs. Supplemented with step-by-step instructions covering data import through wrangling to partitioning and modeling through to inference and deployment, and augmented with pl