The Pragmatic Programmer for Machine Learning: Engineering Analytics and Data Science Solutions

✍ Scribed by Marco Scutari, Mauro Malvestio

Publisher: CRC Press/Chapman & Hall
Year: 2023
Tongue: English
Leaves: 357
Series: Chapman & Hall/CRC Machine Learning & Pattern Recognition
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Machine learning has redefined the way we work with data and is increasingly becoming an indispensable part of everyday life. The Pragmatic Programmer for Machine Learning: Engineering Analytics and Data Science Solutions discusses how modern software engineering practices are part of this revolution both conceptually and in practical applictions.

Comprising a broad overview of how to design machine learning pipelines as well as the state-of-the-art tools we use to make them, this book provides a multi-disciplinary view of how traditional software engineering can be adapted to and integrated with the workflows of domain experts and probabilistic models.

From choosing the right hardware to designing effective pipelines architectures and adopting software development best practices, this guide will appeal to machine learning and data science specialists, whilst also laying out key high-level principlesin a way that is approachable for students of computer science and aspiring programmers.

✦ Table of Contents

Cover
Half Title
Series Page
Title Page
Copyright Page
Dedication
Contents
Preface
1. What Is This Book About?
1.1. Machine Learning
1.2. Data Science
1.3. Software Engineering
1.4. How Do They Go Together?
I. Foundations of Scientific Computing
2. Hardware Architectures
2.1. Types of Hardware
2.1.1. Compute
2.1.2. Memory
2.1.3. Connections
2.2. Making Hardware Live Up to Expectations
2.3. Local and Remote Hardware
2.4. Choosing the Right Hardware for the Job
3. Variable Types and Data Structures
3.1. Variable Types
3.1.1. Integers
3.1.2. Floating Point
3.1.3. Strings
3.2. Data Structures
3.2.1. Vectors and Lists
3.2.2. Representing Data with Data Frames
3.2.3. Dense and Sparse Matrices
3.3. Choosing the Right Variable Types for the Job
3.4. Choosing the Right Data Structures for the Job
4. Analysis of Algorithms
4.1. Writing Pseudocode
4.2. Computational Complexity and Big-O Notation
4.3. Big-O Notation and Benchmarking
4.4. Algorithm Analysis for Machine Learning
4.5. Some Examples of Algorithm Analysis
4.5.1. Estimating Linear Regression Models
4.5.2. Sparse Matrices Representation
4.5.3. Uniform Simulations of Directed Acyclic Graphs
4.6. Big-O Notation and Real-World Performance
II. Best Practices for Machine Learning Pipelines
5. Designing and Structuring Pipelines
5.1. Data as Code
5.2. Technical Debt
5.2.1. At the Data Level
5.2.2. At the Model Level
5.2.3. At the Architecture (Design) Level
5.2.4. At the Code Level
5.3. Machine Learning Pipeline
5.3.1. Project Scoping
5.3.2. Producing a Baseline Implementation
5.3.3. Data Ingestion and Preparation
5.3.4. Model Training, Evaluation and Validation
5.3.5. Deployment, Serving and Inference
5.3.6. Monitoring, Logging and Reporting
6. Writing Machine Learning Code
6.1. Choosing Languages and Libraries
6.2. Naming Things
6.3. Coding Styles and Coding Standards
6.4. Filesystem Structure
6.5. Effective Versioning
6.6. Code Review
6.7. Refactoring
6.8. Reworking Academic Code: An Example
7. Packaging and Deploying Pipelines
7.1. Model Packaging
7.1.1. Standalone Packaging
7.1.2. Programming Language Package Managers
7.1.3. Virtual Machines
7.1.4. Containers
7.2. Model Deployment: Strategies
7.3. Model Deployment: Infrastructure
7.4. Model Deployment: Monitoring and Logging
7.5. What Can Possibly Go Wrong?
7.6. Rolling Back
8. Documenting Pipelines
8.1. Comments
8.2. Documenting Public Interfaces
8.3. Documenting Architecture and Design
8.4. Documenting Algorithms and Business Cases
8.5. Illustrating Practical Use Cases
9. Troubleshooting and Testing Pipelines
9.1. Data Are the Problem
9.1.1. Large Data
9.1.2. Heterogeneous Data
9.1.3. Dynamic Data
9.2. Models Are the Problem
9.2.1. Large Models
9.2.2. Black-Box Models
9.2.3. Costly Models
9.2.4. Many Models
9.3. Common Signs That Something Is Up
9.4. Tests Are the Solution
9.4.1. What Do We Want to Achieve?
9.4.2. What Should We Test?
9.4.3. Offline and Online Data
9.4.4. Testing Local and Testing Global
9.4.5. Conceptual and Implementation Errors
9.4.6. Code Coverage and Test Prioritisation
III. Tools and Technologies
10. Tools for Developing Pipelines
10.1. Data Exploration and Experiment Tracking
10.2. Code Development
10.2.1. Code Editors and IDEs
10.2.2. Notebooks
10.2.3. Accessing Data and Documentation
10.3. Build, Test and Documentation Tools
11. Tools to Manage Pipelines in Production
11.1. Infrastructure Management
11.2. Machine Learning Software Management
11.3. Dashboards, Visualisation and Reporting
IV. A Case Study
12. Recommending Recommendations: A Recommender System Using Natural Language Understanding
12.1. The Domain Problem
12.2. The Machine Learning Model
12.3. The Infrastructure
12.4. The Architecture of the Pipeline
12.4.1. Data Ingestion and Data Preparation
12.4.2. Data Tracking and Versioning
12.4.3. Training and Experiment Tracking
12.4.4. Model Packaging
12.4.5. Deployment and Inference
Bibliography
Index

📜 SIMILAR VOLUMES

Feature Engineering for Machine Learning

📁 Feature Engineering for Machine Learning and Data Analytics

✍ Dong, Guozhu; Liu, Huan 📂 Library 📅 2018 🏛 Taylor and Francis 🌐 English

"Feature engineering plays a vital role in big data analytics. Machine learning and data mining algorithms cannot work without data. Little can be achieved if there are few features to represent the underlying data objects, and the quality of results of those algorithms largely depends on the qualit

Genetic Algorithms and Machine Learning

📁 Genetic Algorithms and Machine Learning for Programmers: Create AI Models and Evolve Solutions (Pragmatic Programmers)

✍ Frances Buontempo 📂 Library 📅 2019 🏛 Pragmatic Bookshelf 🌐 English

<p><span>Self-driving cars, natural language recognition, and online recommendation engines are all possible thanks to Machine Learning. Now you can create your own genetic algorithms, nature-inspired swarms, Monte Carlo simulations, cellular automata, and clusters. Learn how to test your ML code an

Clojure for Data Science: Statistics, bi

📁 Clojure for Data Science: Statistics, big data, and machine learning for Clojure programmers

✍ Henry Garner 📂 Library 📅 2015 🏛 Packt Publishing 🌐 English

The term “data science” has been widely used to define this new profession that is expected to interpret vast datasets and translate them to improved decision-making and performance. Clojure is a powerful language that combines the interactivity of a scripting language with the speed of a compiled l

Clojure for Data Science: Statistics, bi

📁 Clojure for Data Science: Statistics, big data, and machine learning for Clojure programmers

✍ Harner G. 📂 Library 🌐 English

Packt Publishing, 2015. — 607 p. — ISBN: 1784397180, 9781784397180<div class="bb-sep"></div>The term data science has been widely used to define this new profession that is expected to interpret vast datasets and translate them to improved decision-making and performance. Clojure is a powerful langu

Ultimate Java for Data Analytics and Mac

📁 Ultimate Java for Data Analytics and Machine Learning: Unlock Java's Ecosystem for Data Analysis and Machine Learning

✍ Kumar, Abhishek; 📂 Library 📅 2024 🏛 Orange Education PVT Ltd 🌐 English

mpower Your Data Insights with Java's Top Tools and Frameworks. Book Description This book is a comprehensive guide to data analysis using Java. It starts with the fundamentals, covering the purpose of data analysis, different data types and structures, and how to pre-process datasets. It then i

The Kaggle Book: Data analysis and machi

📁 The Kaggle Book: Data analysis and machine learning for competitive data science

✍ Konrad Banachewicz, Luca Massaron 📂 Library 📅 2023 🏛 Packt 🌐 English

Get a step ahead of your competitors with insights from over 30 Kaggle Masters and Grandmasters. Discover tips, tricks, and best practices for competing effectively on Kaggle and becoming a better data scientist. Purchase of the print or Kindle book includes a free eBook in the PDF format. Key F