Distributed Machine Learning Patterns

✍ Scribed by Yuan Tang

Publisher: Manning Publications Co.
Year: 2023
Tongue: English
Leaves: 167
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Practical patterns for scaling machine learning from your laptop to a distributed cluster.

Distributing machine learning systems allow developers to handle extremely large datasets across multiple clusters, take advantage of automation tools, and benefit from hardware accelerations. This book reveals best practice techniques and insider tips for tackling the challenges of scaling machine learning systems.

In Distributed Machine Learning Patterns you will learn how to

Apply distributed systems patterns to build scalable and reliable machine learning projects
Build ML pipelines with data ingestion, distributed training, model serving, and more
Automate ML tasks with Kubernetes, TensorFlow, Kubeflow, and Argo Workflows
Make trade-offs between different patterns and approaches
Manage and monitor machine learning workloads at scale

Inside Distributed Machine Learning Patterns you’ll learn to apply established distributed systems patterns to machine learning projects—plus explore cutting-edge new patterns created specifically for machine learning. Firmly rooted in the real world, this book demonstrates how to apply patterns using examples based in TensorFlow, Kubernetes, Kubeflow, and Argo Workflows. Hands-on projects and clear, practical DevOps techniques let you easily launch, manage, and monitor cloud-native distributed machine learning pipelines.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the technology

Deploying a machine learning application on a modern distributed system puts the spotlight on reliability, performance, security, and other operational concerns. In this in-depth guide, Yuan Tang, project lead of Argo and Kubeflow, shares patterns, examples, and hard-won insights on taking an ML model from a single device to a distributed cluster.

About the book

Distributed Machine Learning Patterns provides dozens of techniques for designing and deploying distributed machine learning systems. In it, you’ll learn patterns for distributed model training, managing unexpected failures, and dynamic model serving. You’ll appreciate the practical examples that accompany each pattern along with a full-scale project that implements distributed model training and inference with autoscaling on Kubernetes.

What's inside

Data ingestion, distributed training, model serving, and more
Automating Kubernetes and TensorFlow with Kubeflow and Argo Workflows
Manage and monitor workloads at scale

About the reader

For data analysts and engineers familiar with the basics of machine learning, Bash, Python, and Docker.

About the author

Yuan Tang is a project lead of Argo and Kubeflow, maintainer of TensorFlow and XGBoost, and author of numerous open source projects.

✦ Table of Contents

Distributed Machine Learning Patterns
Copyright
contents
front matter
preface
acknowledgments
about this book
Who should read this book?
How this book is organized: A roadmap
About the code
liveBook discussion forum
about the author
about the cover illustration
Part 1 Basic concepts and background
1 Introduction to distributed machine learning systems
1.1 Large-scale machine learning
1.1.1 The growing scale
1.1.2 What can we do?
1.2 Distributed systems
1.2.1 What is a distributed system?
1.2.2 The complexity and patterns
1.3 Distributed machine learning systems
1.3.1 What is a distributed machine learning system?
1.3.2 Are there similar patterns?
1.3.3 When should we use a distributed machine learning system?
1.3.4 When should we not use a distributed machine learning system?
1.4 What we will learn in this book
Summary
Part 2 Patterns of distributed machine learning systems
2 Data ingestion patterns
2.1 What is data ingestion?
2.2 The Fashion-MNIST dataset
2.3 Batching pattern
2.3.1 The problem: Performing expensive operations for Fashion MNIST dataset with limited memory
2.3.2 The solution
2.3.3 Discussion
2.3.4 Exercises
2.4 Sharding pattern: Splitting extremely large datasets among multiple machines
2.4.1 The problem
2.4.2 The solution
2.4.3 Discussion
2.4.4 Exercises
2.5 Caching pattern
2.5.1 The problem: Re-accessing previously used data for efficient multi-epoch model training
2.5.2 The solution
2.5.3 Discussion
2.5.4 Exercises
2.6 Answers to exercises
Section 2.3.4
Section 2.4.4
Section 2.5.4
Summary
3 Distributed training patterns
3.1 What is distributed training?
3.2 Parameter server pattern: Tagging entities in 8 million YouTube videos
3.2.1 The problem
3.2.2 The solution
3.2.3 Discussion
3.2.4 Exercises
3.3 Collective communication pattern
3.3.1 The problem: Improving performance when parameter servers become a bottleneck
3.3.2 The solution
3.3.3 Discussion
3.3.4 Exercises
3.4 Elasticity and fault-tolerance pattern
3.4.1 The problem: Handling unexpected failures when training with limited computational resources
3.4.2 The solution
3.4.3 Discussion
3.4.4 Exercises
3.5 Answers to exercises
Section 3.2.4
Section 3.3.4
Section 3.4.4
Summary
4 Model serving patterns
4.1 What is model serving?
4.2 Replicated services pattern: Handling the growing number of serving requests
4.2.1 The problem
4.2.2 The solution
4.2.3 Discussion
4.2.4 Exercises
4.3 Sharded services pattern
4.3.1 The problem: Processing large model serving requests with high-resolution videos
4.3.2 The solution
4.3.3 Discussion
4.3.4 Exercises
4.4 The event-driven processing pattern
4.4.1 The problem: Responding to model serving requests based on events
4.4.2 The solution
4.4.3 Discussion
4.4.4 Exercises
4.5 Answers to exercises
Section 4.2
Section 4.3
Section 4.4
Summary
5 Workflow patterns
5.1 What is workflow?
5.2 Fan-in and fan-out patterns: Composing complex machine learning workflows
5.2.1 The problem
5.2.2 The solution
5.2.3 Discussion
5.2.4 Exercises
5.3 Synchronous and asynchronous patterns: Accelerating workflows with concurrency
5.3.1 The problem
5.3.2 The solution
5.3.3 Discussion
5.3.4 Exercises
5.4 Step memoization pattern: Skipping redundant workloads via memoized steps
5.4.1 The problem
5.4.2 The solution
5.4.3 Discussion
5.4.4 Exercises
5.5 Answers to exercises
Section 5.2
Section 5.3
Section 5.4
Summary
6 Operation patterns
6.1 What are operations in machine learning systems?
6.2 Scheduling patterns: Assigning resources effectively in a shared cluster
6.2.1 The problem
6.2.2 The solution
6.2.3 Discussion
6.2.4 Exercises
6.3 Metadata pattern: Handle failures appropriately to minimize the negative effect on users
6.3.1 The problem
6.3.2 The solution
6.3.3 Discussion
6.3.4 Exercises
6.4 Answers to exercises
Section 6.2
Section 6.3
Summary
Part 3 Building a distributed machine learning workflow
7 Project overview and system architecture
7.1 Project overview
7.1.1 Project background
7.1.2 System components
7.2 Data ingestion
7.2.1 The problem
7.2.2 The solution
7.2.3 Exercises
7.3 Model training
7.3.1 The problem
7.3.2 The solution
7.3.3 Exercises
7.4 Model serving
7.4.1 The problem
7.4.2 The solution
7.4.3 Exercises
7.5 End-to-end workflow
7.5.1 The problems
7.5.2 The solutions
7.5.3 Exercises
7.6 Answers to exercises
Section 7.2
Section 7.3
Section 7.4
Section 7.5
Summary
8 Overview of relevant technologies
8.1 TensorFlow: The machine learning framework
8.1.1 The basics
8.1.2 Exercises
8.2 Kubernetes: The distributed container orchestration system
8.2.1 The basics
8.2.2 Exercises
8.3 Kubeflow: Machine learning workloads on Kubernetes
8.3.1 The basics
8.3.2 Exercises
8.4 Argo Workflows: Container-native workflow engine
8.4.1 The basics
8.4.2 Exercises
8.5 Answers to exercises
Section 8.1
Section 8.2
Section 8.3
Section 8.4
Summary
9 A complete implementation
9.1 Data ingestion
9.1.1 Single-node data pipeline
9.1.2 Distributed data pipeline
9.2 Model training
9.2.1 Model definition and single-node training
9.2.2 Distributed model training
9.2.3 Model selection
9.3 Model serving
9.3.1 Single-server model inference
9.3.2 Replicated model servers
9.4 The end-to-end workflow
9.4.1 Sequential steps
9.4.2 Step memoization
Summary
index

📜 SIMILAR VOLUMES

Distributed Machine Learning Patterns

📁 Distributed Machine Learning Patterns

✍ Yuan Tang 📂 Library 📅 2024 🏛 Manning Publications 🌐 English

Practical patterns for scaling machine learning from your laptop to a distributed cluster. In Distributed Machine Learning Patterns you will learn how to: • Apply distributed systems patterns to build scalable and reliable machine learning projects • Construct machine learning pipelines with da

Distributed Machine Learning Patterns

📁 Distributed Machine Learning Patterns

✍ Yuan Tang 📂 Library 📅 2023 🏛 Manning Publications Co. 🌐 English

Practical patterns for scaling machine learning from your laptop to a distributed cluster. In Distributed Machine Learning Patterns you will learn how to Apply distributed systems patterns to build scalable and reliable machine learning projects Construct machine learning pipelines with data

Distributed Machine Learning Patterns (M

📁 Distributed Machine Learning Patterns (MEAP V07)

✍ Yuan Tang 📂 Library 📅 2023 🏛 Manning Publications 🌐 English

Practical patterns for scaling machine learning from your laptop to a distributed cluster. Scaling up models from standalone devices to large distributed clusters is one of the biggest challenges faced by modern machine learning practitioners. Distributed Machine Learning Patterns teaches you how to

Pattern Recognition & Machine Learning

📁 Pattern Recognition & Machine Learning

✍ Y. Anzai (Auth.) 📂 Library 📅 1992 🏛 Elsevier Science 🌐 English

This is the first text to provide a unified and self-contained introduction to visual pattern recognition and machine learning. It is useful as a general introduction to artifical intelligence and knowledge engineering, and no previous knowledge of pattern recognition or machine learning is necessar

Machine Learning, Animated (Chapman & Ha

📁 Machine Learning, Animated (Chapman & Hall/CRC Machine Learning & Pattern Recognition)

✍ Mark Liu 📂 Library 📅 2023 🏛 Chapman and Hall/CRC 🌐 English

The release of ChatGPT has kicked off an arms race in Machine Learning (ML), however ML has also been described as a black box and very hard to understand. Machine Learning, Animated eases you into basic ML concepts and summarizes the learning process in three word

Machine Learning, Animated (Chapman & Ha

📁 Machine Learning, Animated (Chapman & Hall/CRC Machine Learning & Pattern Recognition)

✍ Mark Liu 📂 Library 📅 2023 🏛 CRC Press LLC 🌐 English

The release of ChatGPT has kicked off an arms race in Machine Learning (ML), however, ML has also been described as a black box and very hard to understand. Machine Learning, Animated eases you into basic ML concepts and summarize the learning process in three words: initialize, adjust and repeat. T