𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Automating Data Quality Monitoring at Scale: Going Deeper than Data Observability (Third Early Release)

✍ Scribed by Jeremy Stanley and Paige Schwartz


Publisher
O'Reilly Media, Inc.
Year
2023
Tongue
English
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


The world's businesses ingest a combined 2.5 quintillion bytes of data every day. But how much of this vast amount of data--used to build products, power AI systems, and drive business decisions--is poor quality or just plain bad? This practical book shows you how to ensure that the data your organization relies on contains only high-quality records.

Most data engineers, data analysts, and data scientists genuinely care about data quality, but they often don't have the time, resources, or understanding to create a data quality monitoring solution that succeeds at scale. In this book, Jeremy Stanley and Paige Schwartz from Anomalo explain how you can use automated data quality monitoring to cover all your tables efficiently, proactively alert on every category of issue, and resolve problems immediately.

We’ll wrap up by introducing the data quality monitoring strategy we advocate for in this book: a three-pillar approach combining rules, metrics monitoring, and Unsupervised Machine Learning. As we’ll show, this approach has multiple benefits. It allows subject matter experts to enforce essential constraints and track KPIs for important tables, while providing a base level of automated monitoring for a large volume of diverse data. This approach doesn’t require massive computer power or legions of analysts to maintain rules and thresholds. With machine learning, it will detect β€œunknown unknowns” in the data and reduce alert fatigue by understanding correlations and trends in the data values across columns and even across tables, alerting only when changes are new and significant

This book will help you:
Learn why data quality is a business imperative
Understand and assess unsupervised learning models for detecting data issues
Implement notifications that reduce alert fatigue and let you triage and resolve issues quickly
Integrate automated data quality monitoring with data catalogs, orchestration layers, and BI and ML systems
Understand the limits of automated data quality monitoring and how to overcome them
Learn how to deploy and manage your monitoring solution at scale
Maintain automated data quality monitoring for the long term


πŸ“œ SIMILAR VOLUMES


Automating Data Quality Monitoring: Goin
✍ Jeremy Stanley, Paige Schwartz πŸ“‚ Library πŸ“… 2024 πŸ› O'Reilly Media 🌐 English

The world's businesses ingest a combined 2.5 quintillion bytes of data every day. But how much of this vast amount of data--used to build products, power AI systems, and drive business decisions--is poor quality or just plain bad? This practical book shows you how to ensure that the data your organi

Automating Data Quality Monitoring at Sc
✍ Jeremy Stanley πŸ“‚ Library πŸ“… 2024 πŸ› O'Reilly Media 🌐 English

<p>The world's businesses ingest a combined 2.5 quintillion bytes of data every day. But how much of this vast amount of dataβ€”used to build products, power AI systems, and drive business decisionsβ€”is poor quality or just plain bad? This practical book shows you how to ensure that the data your organ

Fundamentals of Data Observability (5th
✍ Andy Petrella πŸ“‚ Library πŸ“… 2023 πŸ› O'Reilly Media, Inc. 🌐 English

Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set of best practices that enable data teams to gain greater visibility of data and its usage. If you're a data engineer, data architect, or Machine Learning engineer, or if the quality of your work d

Data Management at Scale: Modern Data Ar
✍ Piethein Strengholt πŸ“‚ Library πŸ“… 2023 πŸ› O'Reilly Media, Inc. 🌐 English

Data management is an emerging and disruptive subject. Datafication is everywhere. This transformation is happening all around us: in smartphones, TV devices, ereaders, industrial machines, self-driving cars, robots, and so on. It’s changing our lives at an accelerating speed. As data management

Data Management at Scale
✍ Piethein Strengholt πŸ“‚ Library πŸ“… 2022 πŸ› O'Reilly Media, Inc. 🌐 English

As data management continues to evolve rapidly, managing all of your data in a central place, such as a data warehouse, is no longer scalable. Today's world is about quickly turning data into value. This requires a paradigm shift in the way we federate responsibilities, manage data, and make it avai