𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Machine Learning Infrastructure and Best Practices for Software Engineers

✍ Scribed by Miroslaw Staron


Publisher
Packt Publishing
Year
2024
Tongue
English
Leaves
346
Edition
1
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Efficiently transform your initial designs into big systems by learning the foundations of infrastructure, algorithms, and ethical considerations for modern software products

Key Features
  • Learn how to scale-up your machine learning software to a professional level
  • Secure the quality of your machine learning pipeline at runtime
  • Apply your knowledge to natural languages, programming languages, and images
  • Book DescriptionAlthough creating a machine learning pipeline or developing a working prototype of a software system from that pipeline is easy and straightforward nowadays, the journey toward a professional software system is still extensive. This book will help you get to grips with various best practices and recipes that will help software engineers transform prototype pipelines into complete software products.The book begins by introducing the main concepts of professional software systems that leverage machine learning at their core. As you...

    ✦ Table of Contents


    Machine Learning Infrastructure and Best Practices for Software Engineers
    Contributors
    About the author
    About the reviewers
    Preface
    Who this book is for
    What this book covers
    To get the most out of this book
    Download the example code files
    Conventions used
    Get in touch
    Share Your Thoughts
    Download a free PDF copy of this book
    Part 1:Machine Learning Landscape in Software Engineering
    Machine Learning Compared to Traditional Software
    Machine learning is not traditional software
    Supervised, unsupervised, and reinforcement learning – it is just the beginning
    An example of traditional and machine learning software
    Probability and software – how well they go together
    Testing and evaluation – the same but different
    Summary
    References
    Elements of a Machine Learning System
    Elements of a production machine learning system
    Data and algorithms
    Data collection
    Feature extraction
    Data validation
    Configuration and monitoring
    Configuration
    Monitoring
    Infrastructure and resource management
    Data serving infrastructure
    Computational infrastructure
    How this all comes together – machine learning pipelines
    References
    Data in Software Systems – Text, Images, Code, and Their Annotations
    Raw data and features – what are the differences?
    Images
    Text
    Visualization of output from more advanced text processing
    Structured text – source code of programs
    Every data has its purpose – annotations and tasks
    Annotating text for intent recognition
    Where different types of data can be used together – an outlook on multi-modal data models
    References
    Data Acquisition, Data Quality, and Noise
    Sources of data and what we can do with them
    Extracting data from software engineering tools – Gerrit and Jira
    Extracting data from product databases – GitHub and Git
    Data quality
    Noise
    Summary
    References
    Quantifying and Improving Data Properties
    Feature engineering – the basics
    Clean data
    Noise in data management
    Attribute noise
    Splitting data
    How ML models handle noise
    References
    Part 2: Data Acquisition and Management
    Processing Data in Machine Learning Systems
    Numerical data
    Summarizing the data
    Diving deeper into correlations
    Summarizing individual measures
    Reducing the number of measures – PCA
    Other types of data – images
    Text data
    Toward feature engineering
    References
    Feature Engineering for Numerical and Image Data
    Feature engineering
    Feature engineering for numerical data
    PCA
    t-SNE
    ICA
    Locally linear embedding
    Linear discriminant analysis
    Autoencoders
    Feature engineering for image data
    Summary
    References
    Feature Engineering for Natural Language Data
    Natural language data in software engineering and the rise of GitHub Copilot
    What a tokenizer is and what it does
    Bag-of-words and simple tokenizers
    WordPiece tokenizer
    BPE
    The SentencePiece tokenizer
    Word embeddings
    FastText
    From feature extraction to models
    References
    Part 3: Design and Development of ML Systems
    Types of Machine Learning Systems – Feature-Based and Raw Data-Based (Deep Learning)
    Why do we need different types of models?
    Classical machine learning models
    Convolutional neural networks and image processing
    BERT and GPT models
    Using language models in software systems
    Summary
    References
    Training and Evaluating Classical Machine Learning Systems and Neural Networks
    Training and testing processes
    Training classical machine learning models
    Understanding the training process
    Random forest and opaque models
    Training deep learning models
    Misleading results – data leaking
    Summary
    References
    Training and Evaluation of Advanced ML Algorithms – GPT and Autoencoders
    From classical ML to GenAI
    The theory behind advanced models – AEs and transformers
    AEs
    Transformers
    Training and evaluation of a RoBERTa model
    Training and evaluation of an AE
    Developing safety cages to prevent models from breaking the entire system
    Summary
    References
    Designing Machine Learning Pipelines (MLOps) and Their Testing
    What ML pipelines are
    ML pipelines
    Elements of MLOps
    ML pipelines – how to use ML in the system in practice
    Deploying models to HuggingFace
    Downloading models from HuggingFace
    Raw data-based pipelines
    Pipelines for NLP-related tasks
    Pipelines for images
    Feature-based pipelines
    Testing of ML pipelines
    Monitoring ML systems at runtime
    Summary
    References
    Designing and Implementing Large-Scale, Robust ML Software
    ML is not alone
    The UI of an ML model
    Data storage
    Deploying an ML model for numerical data
    Deploying a generative ML model for images
    Deploying a code completion model as an extension
    Summary
    References
    Part 4: Ethical Aspects of Data Management and ML System Development
    Ethics in Data Acquisition and Management
    Ethics in computer science and software engineering
    Data is all around us, but can we really use it?
    Ethics behind data from open source systems
    Ethics behind data collected from humans
    Contracts and legal obligations
    References
    Ethics in Machine Learning Systems
    Bias and ML – is it possible to have an objective AI?
    Measuring and monitoring for bias
    Other metrics of bias
    Developing mechanisms to prevent ML bias from spreading throughout the system
    Summary
    References
    Integrating ML Systems in Ecosystems
    Ecosystems
    Creating web services over ML models using Flask
    Creating a web service using Flask
    Creating a web service that contains a pre-trained ML model
    Deploying ML models using Docker
    Combining web services into ecosystems
    Summary
    References
    Summary and Where to Go Next
    To know where we’re going, we need to know where 
we’ve been
    Best practices
    Current developments
    My view on the future
    Final remarks
    References
    Index
    Why subscribe?
    Other Books You May Enjoy
    Packt is searching for authors like you
    Share Your Thoughts
    Download a free PDF copy of this book


    πŸ“œ SIMILAR VOLUMES


    Machine Learning for Edge Computing: Fra
    ✍ Amitoj Singh, Vinay Kukreja, Taghi Javdani Gandomani πŸ“‚ Library πŸ“… 2022 πŸ› CRC Press 🌐 English

    <p><span>This book divides edge intelligence into AI for edge (intelligence-enabled edge computing) and AI on edge (artificial intelligence on edge). It focuses on providing optimal solutions to the key concerns in edge computing through effective AI technologies, and it discusses how to build AI mo

    Machine Learning Applications In Softwar
    ✍ Du Zhang, Jeffrey J. P. Tsai πŸ“‚ Library πŸ“… 2005 πŸ› World Scientific Pub Co Inc 🌐 English

    Machine learning deals with the issue of how to build computer programs that improve their performance at some tasks through experience. Machine learning algorithms have proven to be of great practical value in a variety of application domains. Not surprisingly, the field of software engineering tur

    Software Engineering: Basic Principles a
    ✍ Ravi Sethi πŸ“‚ Library πŸ“… 2023 πŸ› Cambridge University Press 🌐 English

    Software engineering is as much about teamwork as it is about technology. This introductory textbook covers both. For courses featuring a team project, it offers tips and templates for aligning classroom concepts with the needs of the students' projects. Students will learn how software is developed

    Software Engineering: Basic Principles a
    ✍ Ravi Sethi πŸ“‚ Library πŸ“… 2023 πŸ› Cambridge University Press 🌐 English

    Software engineering is as much about teamwork as it is about technology. This introductory textbook covers both. For courses featuring a team project, it offers tips and templates for aligning classroom concepts with the needs of the students' projects. Students will learn how software is developed

    Software Engineering: Basic Principles a
    ✍ Ravi Sethi πŸ“‚ Library πŸ“… 2023 πŸ› Cambridge University Press 🌐 English

    Software engineering is as much about teamwork as it is about technology. This introductory textbook covers both. For courses featuring a team project, it offers tips and templates for aligning classroom concepts with the needs of the students' projects. Students will learn how software is developed