<p><span>Implementing and designing systems that make suggestions to users are among the most popular and essential machine learning applications available. Whether you want customers to find the most appealing items at your online store, videos to enrich and entertain them, or news they need to kno
Building Recommendation Systems in Python and JAX: Hands-On Production Systems at Scale
β Scribed by Bryan Bischof, Hector Yee
- Publisher
- O'Reilly Media
- Year
- 2023
- Tongue
- English
- Leaves
- 355
- Edition
- 1
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Implementing and designing systems that make suggestions to users are among the most popular and essential machine learning applications available. Whether you want customers to find the most appealing items at your online store, videos to enrich and entertain them, or news they need to know, recommendation systems (RecSys) provide the way.
In this practical book, authors Bryan Bischof and Hector Yee illustrate the core concepts and examples to help you create a RecSys for any industry or scale. You'll learn the math, ideas, and implementation details you need to succeed. This book includes the RecSys platform components, relevant MLOps tools in your stack, plus code examples and helpful suggestions in PySpark, SparkSQL, FastAPI, Weights & Biases, and Kafka.
You'll learn:
β’ The data essential for building a RecSys
β’ How to frame your data and business as a RecSys problem
β’ Ways to evaluate models appropriate for your system
β’ Methods to implement, train, test, and deploy the model you choose
β’ Metrics you need to track to ensure your system is working as planned
β’ How to improve your system as you learn more about your users, products, and business case
β¦ Table of Contents
Cover
Copyright
Table of Contents
Preface
Conventions Used in This Book
Using Code Examples
OβReilly Online Learning
How to Contact Us
Acknowledgments
Part I. Warming Up
Chapter 1. Introduction
Key Components of a Recommendation System
Collector
Ranker
Server
Simplest Possible Recommenders
The Trivial Recommender
Most-Popular-Item Recommender
A Gentle Introduction to JAX
Basic Types, Initialization, and Immutability
Indexing and Slicing
Broadcasting
Random Numbers
Just-in-Time Compilation
Summary
Chapter 2. User-Item Ratings and Framing the Problem
The User-Item Matrix
User-User Versus Item-Item Collaborative Filtering
The Netflix Challenge
Soft Ratings
Data Collection and User Logging
What to Log
Collection and Instrumentation
Funnels
Business Insight and What People Like
Summary
Chapter 3. Mathematical Considerations
Zipfβs Laws in RecSys and the Matthew Effect
Sparsity
User Similarity for Collaborative Filtering
Pearson Correlation
Ratings via Similarity
Explore-Exploit as a Recommendation System
Ο΅-greedy
What Should Ο΅ Be?
The NLP-RecSys Relationship
Vector Search
Nearest-Neighbors Search
Summary
Chapter 4. System Design for Recommending
Online Versus Offline
Collector
Offline Collector
Online Collector
Ranker
Offline Ranker
Online Ranker
Server
Offline Server
Online Server
Summary
Chapter 5. Putting It All Together: Content-Based Recommender
Revision Control Software
Python Build Systems
Random-Item Recommender
Obtaining the STL Dataset Images
Convolutional Neural Network Definition
Model Training in JAX, Flax, and Optax
Input Pipeline
Summary
Part II. Retrieval
Chapter 6. Data Processing
Hydrating Your System
PySpark
Example: User Similarity in PySpark
DataLoaders
Database Snapshots
Data Structures for Learning and Inference
Vector Search
Approximate Nearest Neighbors
Bloom Filters
Fun Aside: Bloom Filters as the Recommendation System
Feature Stores
Summary
Chapter 7. Serving Models and Architectures
Architectures by Recommendation Structure
Item-to-User Recommendations
Query-Based Recommendations
Context-Based Recommendations
Sequence-Based Recommendations
Why Bother with Extra Features?
Encoder Architectures and Cold Starting
Deployment
Models as APIs
Spinning Up a Model Service
Workflow Orchestration
Alerting and Monitoring
Schemas and Priors
Integration Tests
Observability
Evaluation in Production
Slow Feedback
Model Metrics
Continuous Training and Deployment
Model Drift
Deployment Topologies
The Evaluation Flywheel
Daily Warm Starts
Lambda Architecture and Orchestration
Logging
Active Learning
Summary
Chapter 8. Putting It All Together: Data Processing and Counting Recommender
Tech Stack
Data Representation
Big Data Frameworks
Cluster Frameworks
PySpark Example
GloVE Model Definition
GloVE Model Specification in JAX and Flax
GloVE Model Training with Optax
Summary
Part III. Ranking
Chapter 9. Feature-Based and Counting-Based Recommendations
Bilinear Factor Models (Metric Learning)
Feature-Based Warm Starting
Segmentation Models and Hybrids
Tag-Based Recommenders
Hybridization
Limitations of Bilinear Models
Counting Recommenders
Return to the Most-Popular-Item Recommender
Correlation Mining
Pointwise Mutual Information via Co-occurrences
Similarity from Co-occurrence
Similarity-Based Recommendations
Summary
Chapter 10. Low-Rank Methods
Latent Spaces
Dot Product Similarity
Co-occurrence Models
Reducing the Rank of a Recommender Problem
Optimizing for MF with ALS
Regularization for MF
Regularized MF Implementation
WSABIE
Dimension Reduction
Isometric Embeddings
Nonlinear Locally Metrizable Embeddings
Centered Kernel Alignment
Affinity and p-sale
Propensity Weighting for Recommendation System Evaluation
Propensity
Simpsonβs and Mitigating Confounding
Summary
Chapter 11. Personalized Recommendation Metrics
Environments
Online and Offline
User Versus Item Metrics
A/B Testing
Recall and Precision
@ k
Precision at k
Recall at k
R-precision
mAP, MMR, NDCG
mAP
MRR
NDCG
mAP Versus NDCG?
Correlation Coefficients
RMSE from Affinity
Integral Forms: AUC and cAUC
Recommendation Probabilities to AUC-ROC
Comparison to Other Metrics
BPR
Summary
Chapter 12. Training for Ranking
Where Does Ranking Fit in Recommender Systems?
Learning to Rank
Training an LTR Model
Classification for Ranking
Regression for Ranking
Classification and Regression for Ranking
WARP
k-order Statistic
BM25
Multimodal Retrieval
Summary
Chapter 13. Putting It All Together: Experimenting and Ranking
Experimentation Tips
Keep It Simple
Debug Print Statements
Defer Optimization
Keep Track of Changes
Use Feature Engineering
Understand Metrics Versus Business Metrics
Perform Rapid Iteration
Spotify Million Playlist Dataset
Building URI Dictionaries
Building the Training Data
Reading the Input
Modeling the Problem
Framing the Loss Function
Exercises
Summary
Part IV. Serving
Chapter 14. Business Logic
Hard Ranking
Learned Avoids
Hand-Tuned Weights
Inventory Health
Implementing Avoids
Model-Based Avoids
Summary
Chapter 15. Bias in Recommendation Systems
Diversification of Recommendations
Improving Diversity
Applying Portfolio Optimization
Multiobjective Functions
Predicate Pushdown
Fairness
Summary
Chapter 16. Acceleration Structures
Sharding
Locality Sensitive Hashing
k-d Trees
Hierarchical k-means
Cheaper Retrieval Methods
Summary
Part V. The Future of Recs
Chapter 17. Sequential Recommenders
Markov Chains
Order-Two Markov Chain
Other Markov Models
RNN and CNN Architectures
Attention Architectures
Self-Attentive Sequential Recommendation
BERT4Rec
Recency Sampling
Merging Static and Sequential
Summary
Chapter 18. Whatβs Next for Recs?
Multimodal Recommendations
Graph-Based Recommenders
Neural Message Passing
Applications
Random Walks
Metapath and Heterogeneity
LLM Applications
LLM Recommenders
LLM Training
Instruct Tuning for Recommendations
LLM Rankers
Recommendations for AI
Summary
Index
About the Authors
Colophon
β¦ Subjects
Python; Recommender Systems; PySpark; Data Processing; Business Logic; Markov Chains; JAX
π SIMILAR VOLUMES
<p>Implementing and designing systems that make suggestions to users are among the most popular and essential machine learning applications available. Whether you want customers to find the most appealing items at your online store, videos to enrich and entertain them, or news they need to know, rec
Recommendation systems are at the heart of almost every internet business today; from Facebook to Netflix to Amazon. Providing good recommendations, whether it's friends, movies or groceries, goes a long way in defining user experience and enticing your customers to use and buy from your platform. T
Build industry-standard recommender systems Only familiarity with Python is required No need to wade through complicated machine learning theory to use this book Objectives Get to grips with the different kinds of recommender systems Master data-wrangling techniques using the pandas library