<div><p>Most data scientists and engineers today rely on quality labeled data to train machine learning models. But building a training set manually is time-consuming and expensive, leaving many companies with unfinished ML projects. There's a more practical approach. In this book, Wee Hyong Tok, Am
Practical Weak Supervision: Doing More with Less Data
β Scribed by Amit Bahree, Wee-Hyong Tok, Senja Filipi
- Publisher
- OβReilly Media, Inc.
- Year
- 2021
- Tongue
- English
- Leaves
- 193
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Most data scientists and engineers today rely on quality labeled data to train machine learning models. But building a training set manually is time-consuming and expensive, leaving many companies with unfinished ML projects. There's a more practical approach. In this book, Wee Hyong Tok, Amit Bahree, and Senja Filipi show you how to create products using weakly supervised learning models.
You'll learn how to build natural language processing and computer vision projects using weakly labeled datasets from Snorkel, a spin-off from the Stanford AI Lab. Because so many companies have pursued ML projects that never go beyond their labs, this book also provides a guide on how to ship the deep learning models you build.
Get up to speed on the field of weak supervision, including ways to use it as part of the data science process
Use Snorkel AI for weak supervision and data programming
Get code examples for using Snorkel to label text and image datasets
Use a weakly labeled dataset for text and image classification
Learn practical considerations for using Snorkel with large datasets and using Spark clusters to scale labeling
β¦ Table of Contents
Copyright
Table of Contents
Foreword by Xuedong Huang
Foreword by Alex Ratner
Preface
Who Should Read This Book
Navigating This Book
Conventions Used in This Book
Using Code Examples
OβReilly Online Learning
How to Contact Us
Acknowledgments
Chapter 1. Introduction to Weak Supervision
What Is Weak Supervision?
Real-World Weak Supervision with Snorkel
Approaches to Weak Supervision
Incomplete Supervision
Inexact Supervision
Inaccurate Supervision
Data Programming
Getting Training Data
How Data Programming Is Helping Accelerate Software 2.0
Summary
Chapter 2. Diving into Data Programming with Snorkel
Snorkel, a Data Programming Framework
Getting Started with Labeling Functions
Applying the Labels to the Datasets
Analyzing the Labeling Performance
Using a Validation Set
Reaching Labeling Consensus with LabelModel
Intuition Behind LabelModel
LabelModel Parameter Estimation
Strategies to Improve the Labeling Functions
Data Augmentation with Snorkel Transformers
Data Augmentation Through Word Removal
Snorkel Preprocessors
Data Augmentation Through GPT-2 Prediction
Data Augmentation Through Translation
Applying the Transformation Functions to the Dataset
Summary
Chapter 3. Labeling in Action
Labeling a Text Dataset: Identifying Fake News
Exploring the Fake News Detection(FakeNewsNet) Dataset
Importing Snorkel and Setting Up Representative Constants
Fact-Checking Sites
Is the Speaker a βLiarβ?
Twitter Profile and Botometer Score
Generating Agreements Between Weak Classifiers
Labeling an Images Dataset: Determining Indoor Versus Outdoor Images
Creating a Dataset of Images from Bing
Defining and Training Weak Classifiers in TensorFlow
Training the Various Classifiers
Weak Classifiers out of Image Tags
Deploying the Computer Vision Service
Interacting with the Computer Vision Service
Preparing the DataFrame
Learning a LabelModel
Summary
Chapter 4. Using the Snorkel-Labeled Dataset for Text Classification
Getting Started with Natural Language Processing (NLP)
Transformers
Hard Versus Probabilistic Labels
Using ktrain for Performing Text Classification
Data Preparation
Dealing with an Imbalanced Dataset
Training the Model
Using the Text Classification Model for Prediction
Finding a Good Learning Rate
Using Hugging Face and Transformers
Loading the Relevant Python Packages
Dataset Preparation
Checking Whether GPU Hardware Is Available
Performing Tokenization
Model Training
Testing the Fine-Tuned Model
Summary
Chapter 5. Using the Snorkel-Labeled Dataset for Image Classification
Visual Object Recognition Overview
Representing Image Features
Transfer Learning for Computer Vision
Using PyTorch for Image Classification
Loading the Indoor/Outdoor Dataset
Utility Functions
Visualizing the Training Data
Fine-Tuning the Pretrained Model
Summary
Chapter 6. Scalability and Distributed Training
The Need for Scalability
Distributed Training
Apache Spark: An Introduction
Spark Application Design
Using Azure Databricks to Scale
Cluster Setup for Weak Supervision
Fake News Detection Dataset on Databricks
Labeling Functions for Snorkel
Setting Up Dependencies
Loading the Data
Fact-Checking Sites
Transfer Learning Using the LIAR Dataset
Weak Classifiers: Generating Agreement
Type Conversions Needed for Spark Runtime
Summary
Index
About the Authors
Colophon
π SIMILAR VOLUMES
<p><span>Supervision for Occupational Therapy</span><span> is a practical text that guides both supervisors and supervisees to make the most out of supervision opportunities. </span></p><p><span>While supervision in occupational therapy is vital as a mechanism for public and professional safety, lea
How can you grow and maintain a reliable, flexible, and cost-efficient network in the face of ever-increasing demands? With this practical guide, network engineers will learn how to program Juniper network devices to perform day-to-day tasks, using the automation features of the Junos OS. Junos supp
<div><p>How can you grow and maintain a reliable, flexible, and cost-efficient network in the face of ever-increasing demands? With this practical guide, network engineers will learn how to program Juniper network devices to perform day-to-day tasks, using the automation features of the Junos OS.</p
The Portal to Lean Production: Principles and Practices for Doing More with Less describes the steps, difficulties, and rewards of implementing lean production. The book moves beyond concepts to address practical matters. The authors provide enough information for you to begin implementing lean prod
<p>This volume contains a collection of papers by economists which examine the various strategies for cutting costs and improving productivity in higher education in the United States. The dramatic increase in the cost of attending most colleges and universities in recent years has led to increasing