Practical patterns for scaling machine learning from your laptop to a distributed cluster. In Distributed Machine Learning Patterns you will learn how to: β’ Apply distributed systems patterns to build scalable and reliable machine learning projects β’ Construct machine learning pipelines with da
Distributed Machine Learning Patterns
β Scribed by Yuan Tang
- Publisher
- Manning Publications Co.
- Year
- 2023
- Tongue
- English
- Leaves
- 167
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Practical patterns for scaling machine learning from your laptop to a distributed cluster.
Distributing machine learning systems allow developers to handle extremely large datasets across multiple clusters, take advantage of automation tools, and benefit from hardware accelerations. This book reveals best practice techniques and insider tips for tackling the challenges of scaling machine learning systems.
In Distributed Machine Learning Patterns you will learn how to
Apply distributed systems patterns to build scalable and reliable machine learning projects
Build ML pipelines with data ingestion, distributed training, model serving, and more
Automate ML tasks with Kubernetes, TensorFlow, Kubeflow, and Argo Workflows
Make trade-offs between different patterns and approaches
Manage and monitor machine learning workloads at scale
Inside Distributed Machine Learning Patterns youβll learn to apply established distributed systems patterns to machine learning projectsβplus explore cutting-edge new patterns created specifically for machine learning. Firmly rooted in the real world, this book demonstrates how to apply patterns using examples based in TensorFlow, Kubernetes, Kubeflow, and Argo Workflows. Hands-on projects and clear, practical DevOps techniques let you easily launch, manage, and monitor cloud-native distributed machine learning pipelines.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the technology
Deploying a machine learning application on a modern distributed system puts the spotlight on reliability, performance, security, and other operational concerns. In this in-depth guide, Yuan Tang, project lead of Argo and Kubeflow, shares patterns, examples, and hard-won insights on taking an ML model from a single device to a distributed cluster.
About the book
Distributed Machine Learning Patterns provides dozens of techniques for designing and deploying distributed machine learning systems. In it, youβll learn patterns for distributed model training, managing unexpected failures, and dynamic model serving. Youβll appreciate the practical examples that accompany each pattern along with a full-scale project that implements distributed model training and inference with autoscaling on Kubernetes.
What's inside
Data ingestion, distributed training, model serving, and more
Automating Kubernetes and TensorFlow with Kubeflow and Argo Workflows
Manage and monitor workloads at scale
About the reader
For data analysts and engineers familiar with the basics of machine learning, Bash, Python, and Docker.
About the author
Yuan Tang is a project lead of Argo and Kubeflow, maintainer of TensorFlow and XGBoost, and author of numerous open source projects.
β¦ Table of Contents
Distributed Machine Learning Patterns
Copyright
contents
front matter
preface
acknowledgments
about this book
Who should read this book?
How this book is organized: A roadmap
About the code
liveBook discussion forum
about the author
about the cover illustration
Part 1 Basic concepts and background
1 Introduction to distributed machine learning systems
1.1 Large-scale machine learning
1.1.1 The growing scale
1.1.2 What can we do?
1.2 Distributed systems
1.2.1 What is a distributed system?
1.2.2 The complexity and patterns
1.3 Distributed machine learning systems
1.3.1 What is a distributed machine learning system?
1.3.2 Are there similar patterns?
1.3.3 When should we use a distributed machine learning system?
1.3.4 When should we not use a distributed machine learning system?
1.4 What we will learn in this book
Summary
Part 2 Patterns of distributed machine learning systems
2 Data ingestion patterns
2.1 What is data ingestion?
2.2 The Fashion-MNIST dataset
2.3 Batching pattern
2.3.1 The problem: Performing expensive operations for Fashion MNIST dataset with limited memory
2.3.2 The solution
2.3.3 Discussion
2.3.4 Exercises
2.4 Sharding pattern: Splitting extremely large datasets among multiple machines
2.4.1 The problem
2.4.2 The solution
2.4.3 Discussion
2.4.4 Exercises
2.5 Caching pattern
2.5.1 The problem: Re-accessing previously used data for efficient multi-epoch model training
2.5.2 The solution
2.5.3 Discussion
2.5.4 Exercises
2.6 Answers to exercises
Section 2.3.4
Section 2.4.4
Section 2.5.4
Summary
3 Distributed training patterns
3.1 What is distributed training?
3.2 Parameter server pattern: Tagging entities in 8 million YouTube videos
3.2.1 The problem
3.2.2 The solution
3.2.3 Discussion
3.2.4 Exercises
3.3 Collective communication pattern
3.3.1 The problem: Improving performance when parameter servers become a bottleneck
3.3.2 The solution
3.3.3 Discussion
3.3.4 Exercises
3.4 Elasticity and fault-tolerance pattern
3.4.1 The problem: Handling unexpected failures when training with limited computational resources
3.4.2 The solution
3.4.3 Discussion
3.4.4 Exercises
3.5 Answers to exercises
Section 3.2.4
Section 3.3.4
Section 3.4.4
Summary
4 Model serving patterns
4.1 What is model serving?
4.2 Replicated services pattern: Handling the growing number of serving requests
4.2.1 The problem
4.2.2 The solution
4.2.3 Discussion
4.2.4 Exercises
4.3 Sharded services pattern
4.3.1 The problem: Processing large model serving requests with high-resolution videos
4.3.2 The solution
4.3.3 Discussion
4.3.4 Exercises
4.4 The event-driven processing pattern
4.4.1 The problem: Responding to model serving requests based on events
4.4.2 The solution
4.4.3 Discussion
4.4.4 Exercises
4.5 Answers to exercises
Section 4.2
Section 4.3
Section 4.4
Summary
5 Workflow patterns
5.1 What is workflow?
5.2 Fan-in and fan-out patterns: Composing complex machine learning workflows
5.2.1 The problem
5.2.2 The solution
5.2.3 Discussion
5.2.4 Exercises
5.3 Synchronous and asynchronous patterns: Accelerating workflows with concurrency
5.3.1 The problem
5.3.2 The solution
5.3.3 Discussion
5.3.4 Exercises
5.4 Step memoization pattern: Skipping redundant workloads via memoized steps
5.4.1 The problem
5.4.2 The solution
5.4.3 Discussion
5.4.4 Exercises
5.5 Answers to exercises
Section 5.2
Section 5.3
Section 5.4
Summary
6 Operation patterns
6.1 What are operations in machine learning systems?
6.2 Scheduling patterns: Assigning resources effectively in a shared cluster
6.2.1 The problem
6.2.2 The solution
6.2.3 Discussion
6.2.4 Exercises
6.3 Metadata pattern: Handle failures appropriately to minimize the negative effect on users
6.3.1 The problem
6.3.2 The solution
6.3.3 Discussion
6.3.4 Exercises
6.4 Answers to exercises
Section 6.2
Section 6.3
Summary
Part 3 Building a distributed machine learning workflow
7 Project overview and system architecture
7.1 Project overview
7.1.1 Project background
7.1.2 System components
7.2 Data ingestion
7.2.1 The problem
7.2.2 The solution
7.2.3 Exercises
7.3 Model training
7.3.1 The problem
7.3.2 The solution
7.3.3 Exercises
7.4 Model serving
7.4.1 The problem
7.4.2 The solution
7.4.3 Exercises
7.5 End-to-end workflow
7.5.1 The problems
7.5.2 The solutions
7.5.3 Exercises
7.6 Answers to exercises
Section 7.2
Section 7.3
Section 7.4
Section 7.5
Summary
8 Overview of relevant technologies
8.1 TensorFlow: The machine learning framework
8.1.1 The basics
8.1.2 Exercises
8.2 Kubernetes: The distributed container orchestration system
8.2.1 The basics
8.2.2 Exercises
8.3 Kubeflow: Machine learning workloads on Kubernetes
8.3.1 The basics
8.3.2 Exercises
8.4 Argo Workflows: Container-native workflow engine
8.4.1 The basics
8.4.2 Exercises
8.5 Answers to exercises
Section 8.1
Section 8.2
Section 8.3
Section 8.4
Summary
9 A complete implementation
9.1 Data ingestion
9.1.1 Single-node data pipeline
9.1.2 Distributed data pipeline
9.2 Model training
9.2.1 Model definition and single-node training
9.2.2 Distributed model training
9.2.3 Model selection
9.3 Model serving
9.3.1 Single-server model inference
9.3.2 Replicated model servers
9.4 The end-to-end workflow
9.4.1 Sequential steps
9.4.2 Step memoization
Summary
index
π SIMILAR VOLUMES
Practical patterns for scaling machine learning from your laptop to a distributed cluster. In Distributed Machine Learning Patterns you will learn how to Apply distributed systems patterns to build scalable and reliable machine learning projects Construct machine learning pipelines with data
Practical patterns for scaling machine learning from your laptop to a distributed cluster. Scaling up models from standalone devices to large distributed clusters is one of the biggest challenges faced by modern machine learning practitioners. Distributed Machine Learning Patterns teaches you how to
This is the first text to provide a unified and self-contained introduction to visual pattern recognition and machine learning. It is useful as a general introduction to artifical intelligence and knowledge engineering, and no previous knowledge of pattern recognition or machine learning is necessar
<p><span>The release of ChatGPT has kicked off an arms race in Machine Learning (ML), however ML has also been described as a black box and very hard to understand. </span><span>Machine Learning, Animated</span><span> eases you into basic ML concepts and summarizes the learning process in three word
The release of ChatGPT has kicked off an arms race in Machine Learning (ML), however, ML has also been described as a black box and very hard to understand. Machine Learning, Animated eases you into basic ML concepts and summarize the learning process in three words: initialize, adjust and repeat. T