<p><span>Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set of best practices that enables data teams to gain greater visibility of data and its usage. If you're a data engineer, data architect, or machine learning engineer who depends on the qual
Fundamentals of Data Observability: Implement Trustworthy End-to-End Data Solutions
β Scribed by Andy Petrella
- Publisher
- O'Reilly Media
- Year
- 2023
- Tongue
- English
- Leaves
- 267
- Edition
- 1
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set of best practices that enables data teams to gain greater visibility of data and its usage. If you're a data engineer, data architect, or machine learning engineer who depends on the quality of your data, this book shows you how to focus on the practical aspects of introducing data observability in your everyday work.
Author Andy Petrella helps you build the right habits to identify and solve data issues, such as data drifts and poor quality, so you can stop their propagation in data applications, pipelines, and analytics. You'll learn ways to introduce data observability, including setting up a framework for generating and collecting all the information you need.
β’ Learn the core principles and benefits of data observability
β’ Use data observability to detect, troubleshoot, and prevent data issues
β’ Follow the book's recipes to implement observability in your data projects
β’ Use data observability to create a trustworthy communication framework with data consumers
β’ Learn how to educate your peers about the benefits of data observability
β¦ Table of Contents
Copyright
Table of Contents
Preface
Overview of the Book
Who Should Read This Book
Conventions Used in This Book
Using Code Examples
OβReilly Online Learning
How to Contact Us
Acknowledgments
Part I. Introducing Data Observability
Chapter 1. Introducing Data Observability
Scaling Data Teams
Challenges of Scaling Data Teams
Segregated Roles and Responsibilities and Organizational Complexity
Anatomy of Data Issues and Consequences
Impact of Data Issues on Data Team Dynamics
Scaling AI Roadblocks
Challenges with Current Data Management Practices
Effects of Data Governance at Scale
Data Observability to the Rescue
The Areas of Observability
How Data Teams Can Leverage Data Observability Now
Low Latency Data Issues Detection
Efficient Data Issues Troubleshooting
Preventing Data Issues
Decentralized Data Quality Management
Complementing Existing Data Governance Capabilities
The Future and Beyond
Conclusion
Chapter 2. Components of Data Observability
Channels of Data Observability Information
Logs
Traces
Metrics
Observations Model
Physical Space
Server
User
Static Space
Dynamic Space
Expectations
Rules
Automatic Anomaly Detection
Prevent Garbage In, Garbage Out
Conclusion
Chapter 3. Roles of Data Observability in a Data Organization
Data Architecture
Where Does Data Observability Fit in a Data Architecture?
Data Architecture with Data Observability
How Data Observability Helps with Data Engineering Undercurrents
Security
Data Management
Support for Data Meshβs Data as Products
Conclusion
Part II. Implementing Data Observability
Chapter 4. Generate Data Observations
At the Source
Generating Data Observations at the Source
Low-Level API in Python
Description of the Data Pipeline
Definition of the Status of the Data Pipeline
Data Observations for the Data Pipeline
Generate Contextual Data Observations
Generate Data-Related Observations
Generate Lineage-Related Data Observations
Wrap-Up: The Data-Observable Data Pipeline
Using Data Observations to Address Failures of the Data Pipeline
Conclusion
Chapter 5. Automate the Generation of Data Observations
Abstraction Strategies
Event Listeners
Aspect-Oriented Programming
High-Level Applications
No-Code Applications
Low-Code Applications
Differences Among Monitoring Alternatives
Conclusion
Chapter 6. Implementing Expectations
Introducing Expectations
Shift-Left Data Quality
Corner Cases Discovery
Lifting Service Level Indicators
Using Data Profilers
Maintaining Expectations
Overarching Practices
Fail Fast and Fail Safe
Simplify Tests and Extend CI/CD
Conclusion
Part III. Data Observability in Action
Chapter 7. Integrating Data Observability in Your Data Stack
Ingestion Stage
Ingestion Stage Data Observability Recipes
Airbyte Agent
Transformation
Transformation Stage Data Observability Recipes
Apache Spark
dbt Agent
Serving
Recipes
BigQuery in Python
Orchestrated SQL with Airflow
Analytics
Machine Learning Recipes
Business Intelligence Recipes
Conclusion
Chapter 8. Making Opaque Systems Translucent
Data Translucence
Opaque Systems
SaaS
Donβt Touch It; It (Kinda) Works
Inherited Systems
Strategies for Data Translucence
Strategies
The Data Observability Connector
Example: Building a dbt Data Observability Connector (SaaS)
Conclusion
Afterword: Future Observations
Unification of Processing
Generative Milestones
Trustable Expanded Creativity
Conclusion
Index
About the Author
Colophon
β¦ Subjects
Data Ingestion; Data Architecture; Data Observability
π SIMILAR VOLUMES
Are you looking to use data as a strategic asset in your organization, so that more people can make better, data-driven decisions and accelerate time to value? This report explains how. Whether you're working on self-service analytics, data governance, or cloud data migration, authors Fadi Maali, an
Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set of best practices that enable data teams to gain greater visibility of data and its usage. If you're a data engineer, data architect, or machine learning engineer, or if the quality of your work d
Do your product dashboards look funky? Are your quarterly reports stale? Is the data set you're using broken or just plain wrong? These problems affect almost every team, yet they're usually addressed on an ad hoc basis and in a reactive manner. If you answered yes to these questions, this book is f
Do your product dashboards look funky? Are your quarterly reports stale? Is the data set you're using broken or just plain wrong? These problems affect almost every team, yet they're usually addressed on an ad hoc basis and in a reactive manner. If you answered yes to these questions, this book is f