๐”– Scriptorium
โœฆ   LIBER   โœฆ

๐Ÿ“

Machine Learning Design Interview: Machine Learning System Design Interview

โœ Scribed by Khang Pham


Publisher
Independently published
Year
2022
Tongue
English
Leaves
236
Category
Library

โฌ‡  Acquire This Volume

No coin nor oath required. For personal study only.

โœฆ Synopsis


This book provides:

  • End to end design of the most popular Machine Learning system at big tech companies.
  • Most common Machine Learning Design interview questions at big tech companies (Facebook, Apple, Amazon, Google, Uber, LinkedIn)

Who should read this book?
  • Data scientist, software engineer or data engineer who have a background in Machine Learning but never work on Machine Learning at scale will find this book helpful.

โœฆ Table of Contents


Preface
Who should read this book?
How to read this book?
Machine Learning Primer
Feature Selection and Feature Engineering
One Hot Encoding
Common Problems
Best Practices
One Hot Encoding in Tech Companies
Mean Encoding
Feature Hashing
Benefits
Feature Hashing Example
Feature Hashing in Tech Companies
Cross Feature
Embedding
How to Train Embedding
How Does Instagram Train User Embedding?
How Does DoorDash Train Store Embedding?
How Does YouTube Train Embedding in Retrieval?
How Does LinkedIn Train Embedding?
How Does Pinterest Learn Visual Embedding
Application of Embedding in Tech Companies
How Do We Evaluate the Quality of the Embedding?
How Do We Measure Similarity?
Important Considerations
Numeric Features
Normalization
Standardization
Feature Selection and Feature Engineering Quiz
Summary
Training Pipeline
Data Partitioning
Handle Imbalance Class Distribution
Common Resampling Use Cases
Data Generation Strategy
How LinkedIn Generates Data for Course Recommendation
Member to Skill
How to Split Train/Test Data
Sliding Window
Expanding Window
Retraining Requirements
Four Levels of Retraining
Loss Function and Metrics Evaluation
Regression Loss
Mean Square Error and Mean Absolute Error
Huber Loss
Quantile Loss
How Facebook Uses Normalized Cross Entropy for AdClick Prediction?
Forecast Metrics
Mean Absolute Percentage Error
Symmetric Absolute Percentage Error
Classification Loss
Focal Loss
Hinge Loss
Model Evaluation
Area Under the Curve
Mean Average Recall at K
Mean Average Precision (MAP)
Mean Reciprocal Rank (MRR)
Normalized Discounted Cumulative Gain
Cumulative Gain
Online Metrics
Other Metrics: Click-Through Rate, Time Spent
Common Sampling Techniques
Random Sampling
Rejection Sampling
Weight Sampling
Importance Sampling
Code Example
Stratified Sampling
Reservoir Sampling
Common Deep Learning Model Architecture
Wide and Deep Architecture
Benefits
Architecture
Two-Tower Architecture
Deep Cross Network
Benefits
Multitask Learning
Architecture
Benefits
Facebook Deep Learning Recommendation Model (DLRM)
Requirements and Data
Metrics
Features
Model
A/B Testing Fundamental
Budget-Splitting
Benefits
Common Deployment Patterns
Imbalance Workload
Serving Logics and Multiple Models
Serving Embedding
High-level Architecture
Offline Serving
Nearline Serving
Approximate Nearest Neighbor Search
How Onebar Uses ANN for Their Search Service
Deployment Example
Spotify: one simple mistake took four months to detect
How to make prediction with the wrong data?
Chapter Exercises
Quiz 1: Quiz on Cross Entropy
Quiz 2: Quiz on Cross Entropy
Quiz 3: Quiz on Accuracy
Quiz 4: Quiz on accuracy
Common Recommendation System Components
Candidate Generation
Content-Based Filtering
Trade-Offs
Collaborative Filtering
Trade-Offs
How Pinterest Does Candidate Generation
Co-occurrences of Items to Generate Candidates
Online Random Walk
Session Co-occurrence
How YouTube Build Video Recommendation Retrieval Stack
Ranking
How to Build a ML-Based Search Engine
RankNet
Example
Observations
Re-ranking
Freshness
Diversity
Fairness
Position Bias
Why Would This be an Issue in Machine Learning Model Training?
Use Position as feature
Use Position as Feature
Inverse Propensity Score
How LinkedIn Uses Impression Discount in People You May Know (PYMK) Features
Calibration
Definition
Example and solution
Calibration plot
Nonstationary Problem
Exploration vs. Exploitation
Airbnb: Deep Learning is NOT a drop-in replacement
Interview Exercises
Machine Learning Usecases from Top Companies
Airbnb - Room Classification
Requirement and Data
Challenges
Metrics
Features
Model
Model Architecture
Model Training and Serving
Improvements
Instagram: Feed Recommendation from Non-friends
Scope/Requirements
Metrics
Data
Model Training and Serving
Co-occurrence Based Similarity
Cold Start Problem
LinkedIn: Talent Search and Recommendation
Scope/Requirements
Metrics
Data and Features
Other Considerations
Model
Overall System
Retrieval Stack
LinkedIn - People You May Know
Scope/Requirements
Metrics
Link Prediction Features and Data
Link Prediction Model
Value from Connection
Linkedin - Learning Course Recommendation
Scope/Requirements
Metrics
Data
Model Training
Scoring
Candidate Generation
Data Streaming pipeline
Overall System
High-level Architecture
Uber - Estimate Time Arrival
Scope/Requirements
Metrics
Data
Features
Model
Overall System
Real-time Serving High Level
Training Pipeline
YouTube Video Recommendations
Problem Statement
Metrics Design and Requirements
Metrics
Requirements
Training
Inference
Summary
Multistage Models
Model Training
Candidate Generation Model
Training Data
Feature Engineering
Model
Ranking Model
Training Data
Features Engineering
Model
Question 1
Question 2
Calculation and Estimation
Assumptions
Bandwidth and Scale
System Design
Training
Challenges
Inference
Scale the Design
Interview Exercise
Summary
Question 3
Question 4
LinkedIn Feed Ranking
Problem Statement
Challenges
Metrics Design and Requirements
Metrics
Offline Metrics
Online Metrics
Requirements
Training
Inference
Model
Training Data
Problems
Possible Solutions
Feature Engineering
Model
Evaluation
Model Requirements
Training
Calculation and Estimation
Assumptions
Data Size and Scale
High-level Design
Feed Ranking Flow
Feature Store
Items Store
Scale the Design
Summary
Ad Click Prediction
Problem Statement
Challenges
Metrics Design and Requirements
Offline metrics
Online Metrics
Requirements
Training
Inference
Model
Feature Engineering
Model Architecture
Calculation and Estimation
Assumptions
Data Size
High-level Design
Training
Serving
Scale the Design
Airbnb Rental Search Ranking
Problem Statement
Challenges
Metrics Design and Requirements
Offline Metrics
Online Metrics
Requirements
Training
Inference
Model Training
Training Data
Model Architecture
Feature Engineering
Calculation and Estimation
Assumptions
Data Size
High-level Design
Scale the Design
Open Questions
Summary
Estimate Food Delivery Time
Problem Statement
Metrics Design and Requirements
Metrics
Offline Metrics
Online Metrics
Requirements
Summary
Model
Training Data
Model
Probabilistic Model with Confident Interval: Quantile Regression
Features Engineering
System Design
Requirements
Inference
Training
Calculation and Estimation
Assumptions
Data Size
High-level Design
Inference
Training
Scale the Design
Summary
Machine Learning Assessment
Practice 1: Machine Learning Knowledge
Regression
Confidence Interval
Forecast Model
Correlation
Coding
Clustering
SQL
Database
Machine Learning Model Diagnosis
Classification
Feature Important
Confusion Matrix
Classification metrics
Random Forest
Statistics
Random Forest Tuning
Decision Tree Tuning
Deep Learning Diagnosis
Deep Learning
Deep Learning Questions


๐Ÿ“œ SIMILAR VOLUMES


Machine Learning Design Interview: Machi
โœ Khang Pham ๐Ÿ“‚ Library ๐Ÿ“… 2022 ๐Ÿ› Independently published ๐ŸŒ English

<span>This book provides:</span><ul><li><span><span>End to end design of the most popular Machine Learning system at big tech companies.</span></span></li><li><span><span>Most common Machine Learning Design interview questions at big tech companies (Facebook, Apple, Amazon, Google, Uber, LinkedIn)</

Designing Machine Learning Systems
โœ Chip Huyen ๐Ÿ“‚ Library ๐Ÿ“… 2022 ๐Ÿ› O'Reilly Media, Inc. ๐ŸŒ English

Many tutorials show you how to develop ML systems from ideation to deployed models. But with constant changes in tooling, those systems can quickly become outdated. Without an intentional design to hold the components together, these systems will become a technical liability, prone to errors and be

Designing Machine Learning Systems
โœ Chip Huyen ๐Ÿ“‚ Library ๐Ÿ“… 2022 ๐Ÿ› O'Reilly Media, Inc. ๐ŸŒ English

Many tutorials show you how to develop ML systems from ideation to deployed models. But with constant changes in tooling, those systems can quickly become outdated. Without an intentional design to hold the components together, these systems will become a technical liability, prone to errors and be

Machine Learning Interviews: Kickstart Y
โœ Susan Shu Chang ๐Ÿ“‚ Library ๐Ÿ“… 2023 ๐Ÿ› O'Reilly Media ๐ŸŒ English

<p>As tech products become more prevalent today, the demand for machine learning professionals continues to grow. But the responsibilities and skill sets required of ML professionals still vary drastically from company to company, making the interview process difficult to predict. In this guide, dat