Machine Learning Infrastructure and Best Practices for Software Engineers

✍ Scribed by Miroslaw Staron

Publisher: Packt Publishing
Year: 2024
Tongue: English
Leaves: 346
Edition: 1
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Efficiently transform your initial designs into big systems by learning the foundations of infrastructure, algorithms, and ethical considerations for modern software products

Key Features

Learn how to scale-up your machine learning software to a professional level

Secure the quality of your machine learning pipeline at runtime

Apply your knowledge to natural languages, programming languages, and images

Book DescriptionAlthough creating a machine learning pipeline or developing a working prototype of a software system from that pipeline is easy and straightforward nowadays, the journey toward a professional software system is still extensive. This book will help you get to grips with various best practices and recipes that will help software engineers transform prototype pipelines into complete software products.The book begins by introducing the main concepts of professional software systems that leverage machine learning at their core. As you...

✦ Table of Contents

Machine Learning Infrastructure and Best Practices for Software Engineers
Contributors
About the author
About the reviewers
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Share Your Thoughts
Download a free PDF copy of this book
Part 1:Machine Learning Landscape in Software Engineering
Machine Learning Compared to Traditional Software
Machine learning is not traditional software
Supervised, unsupervised, and reinforcement learning – it is just the beginning
An example of traditional and machine learning software
Probability and software – how well they go together
Testing and evaluation – the same but different
Summary
References
Elements of a Machine Learning System
Elements of a production machine learning system
Data and algorithms
Data collection
Feature extraction
Data validation
Configuration and monitoring
Configuration
Monitoring
Infrastructure and resource management
Data serving infrastructure
Computational infrastructure
How this all comes together – machine learning pipelines
References
Data in Software Systems – Text, Images, Code, and Their Annotations
Raw data and features – what are the differences?
Images
Text
Visualization of output from more advanced text processing
Structured text – source code of programs
Every data has its purpose – annotations and tasks
Annotating text for intent recognition
Where different types of data can be used together – an outlook on multi-modal data models
References
Data Acquisition, Data Quality, and Noise
Sources of data and what we can do with them
Extracting data from software engineering tools – Gerrit and Jira
Extracting data from product databases – GitHub and Git
Data quality
Noise
Summary
References
Quantifying and Improving Data Properties
Feature engineering – the basics
Clean data
Noise in data management
Attribute noise
Splitting data
How ML models handle noise
References
Part 2: Data Acquisition and Management
Processing Data in Machine Learning Systems
Numerical data
Summarizing the data
Diving deeper into correlations
Summarizing individual measures
Reducing the number of measures – PCA
Other types of data – images
Text data
Toward feature engineering
References
Feature Engineering for Numerical and Image Data
Feature engineering
Feature engineering for numerical data
PCA
t-SNE
ICA
Locally linear embedding
Linear discriminant analysis
Autoencoders
Feature engineering for image data
Summary
References
Feature Engineering for Natural Language Data
Natural language data in software engineering and the rise of GitHub Copilot
What a tokenizer is and what it does
Bag-of-words and simple tokenizers
WordPiece tokenizer
BPE
The SentencePiece tokenizer
Word embeddings
FastText
From feature extraction to models
References
Part 3: Design and Development of ML Systems
Types of Machine Learning Systems – Feature-Based and Raw Data-Based (Deep Learning)
Why do we need different types of models?
Classical machine learning models
Convolutional neural networks and image processing
BERT and GPT models
Using language models in software systems
Summary
References
Training and Evaluating Classical Machine Learning Systems and Neural Networks
Training and testing processes
Training classical machine learning models
Understanding the training process
Random forest and opaque models
Training deep learning models
Misleading results – data leaking
Summary
References
Training and Evaluation of Advanced ML Algorithms – GPT and Autoencoders
From classical ML to GenAI
The theory behind advanced models – AEs and transformers
AEs
Transformers
Training and evaluation of a RoBERTa model
Training and evaluation of an AE
Developing safety cages to prevent models from breaking the entire system
Summary
References
Designing Machine Learning Pipelines (MLOps) and Their Testing
What ML pipelines are
ML pipelines
Elements of MLOps
ML pipelines – how to use ML in the system in practice
Deploying models to HuggingFace
Downloading models from HuggingFace
Raw data-based pipelines
Pipelines for NLP-related tasks
Pipelines for images
Feature-based pipelines
Testing of ML pipelines
Monitoring ML systems at runtime
Summary
References
Designing and Implementing Large-Scale, Robust ML Software
ML is not alone
The UI of an ML model
Data storage
Deploying an ML model for numerical data
Deploying a generative ML model for images
Deploying a code completion model as an extension
Summary
References
Part 4: Ethical Aspects of Data Management and ML System Development
Ethics in Data Acquisition and Management
Ethics in computer science and software engineering
Data is all around us, but can we really use it?
Ethics behind data from open source systems
Ethics behind data collected from humans
Contracts and legal obligations
References
Ethics in Machine Learning Systems
Bias and ML – is it possible to have an objective AI?
Measuring and monitoring for bias
Other metrics of bias
Developing mechanisms to prevent ML bias from spreading throughout the system
Summary
References
Integrating ML Systems in Ecosystems
Ecosystems
Creating web services over ML models using Flask
Creating a web service using Flask
Creating a web service that contains a pre-trained ML model
Deploying ML models using Docker
Combining web services into ecosystems
Summary
References
Summary and Where to Go Next
To know where we’re going, we need to know where  we’ve been
Best practices
Current developments
My view on the future
Final remarks
References
Index
Why subscribe?
Other Books You May Enjoy
Packt is searching for authors like you
Share Your Thoughts
Download a free PDF copy of this book

📜 SIMILAR VOLUMES

Machine Learning for Edge Computing: Fra

📁 Machine Learning for Edge Computing: Frameworks, Patterns and Best Practices

✍ Amitoj Singh, Vinay Kukreja, Taghi Javdani Gandomani 📂 Library 📅 2022 🏛 CRC Press 🌐 English

<p><span>This book divides edge intelligence into AI for edge (intelligence-enabled edge computing) and AI on edge (artificial intelligence on edge). It focuses on providing optimal solutions to the key concerns in edge computing through effective AI technologies, and it discusses how to build AI mo

Software Engineering Best Practices

📁 Software Engineering Best Practices

📂 Library 📅 2009 🏛 McGraw-Hill 🌐 English

Machine Learning Applications In Softwar

📁 Machine Learning Applications In Software Engineering (Series on Software Engineering and Knowledge Engineering)

✍ Du Zhang, Jeffrey J. P. Tsai 📂 Library 📅 2005 🏛 World Scientific Pub Co Inc 🌐 English

Machine learning deals with the issue of how to build computer programs that improve their performance at some tasks through experience. Machine learning algorithms have proven to be of great practical value in a variety of application domains. Not surprisingly, the field of software engineering tur

Software Engineering: Basic Principles a

📁 Software Engineering: Basic Principles and Best Practices

✍ Ravi Sethi 📂 Library 📅 2023 🏛 Cambridge University Press 🌐 English

Software engineering is as much about teamwork as it is about technology. This introductory textbook covers both. For courses featuring a team project, it offers tips and templates for aligning classroom concepts with the needs of the students' projects. Students will learn how software is developed

Software Engineering: Basic Principles a

📁 Software Engineering: Basic Principles and Best Practices

✍ Ravi Sethi 📂 Library 📅 2023 🏛 Cambridge University Press 🌐 English

Software Engineering: Basic Principles a

📁 Software Engineering: Basic Principles and Best Practices

✍ Ravi Sethi 📂 Library 📅 2023 🏛 Cambridge University Press 🌐 English