𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Understanding ETL: Data Pipelines for Modern Data Architectures by Matt Palmer

✍ Scribed by Matt Palmer


Publisher
O'Reilly Media, Inc.
Year
2024
Tongue
English
Leaves
97
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Extract, transform, load (ETL) is at the center of every application of data, from business intelligence to AI. This technical guide offers data engineers, engineering managers, and architects an overview of the modern ETL process, along with the challenges you're likely to face and the strategic patterns that will help you overcome them. You will be equipped to make informed decisions when implementing ETL and choose the technology stack that will help you succeed.

✦ Table of Contents


Introduction
The Brave New World of AI
A Changing Data Landscape
What About ELT (and Other Flavors)?
O’Reilly Online Learning
How to Contact Us
Acknowledgments
1. Data Ingestion
Data Ingestionβ€”Now Versus Then
Sources and Targets
The Source
Examining sources
Questions to ask
Source checklist
The Destination
Examining destinations
Staging ingested data
Change data capture
Destination checklist
Ingestion Considerations
Frequency
Batch
Micro-batch
Streaming
Methods
Message services
Stream processing engines
Simplifying stream processing
Payload
Volume
Structure and shape
Unstructured
Semi-structured
Structured
Format
Variety
Choosing a Solution
Declarative Solutions
Cost to build/maintain
Extensibility
Cost to switch
Imperative Solutions
Extensibility
Cost to build/maintain
Cost to switch
Hybrid Solutions
2. Data Transformation
What Is Data Transformation?
Where Are We Now?
How Do We Transform Data?
Environments
Data staging
Languages
Frameworks
Other approaches
Building a Transformation Solution
Data Transformation Patterns
Data Update Patterns
Best Practices
Real-Time Data Transformation
The Future of Data Transformation
3. Data Orchestration
What Is Data Orchestration?
Why Orchestrate?
The DAG
Data Orchestration Tools
Choosing an Orchestrator
Characteristics
Orchestrator options
Orchestrating SQL
Design Patterns and Best Practices
The Future of Data Orchestration
4. Pipeline Issues and Troubleshooting
Maintainability
Monitoring and Benchmarking
Metrics
Methods
Errors
Error Handling
Recovery
Improving Workflows
Start with Relationships
Align Incentives
Improve Outcomes
5. Efficiency and Scalability
Efficiency and Scalability Defined
Understand Your Environment
Frameworks
Resource Allocation
Parallelization and concurrency
Clusters
Spot versus on-demand cluster instances
Pooling
Cluster sharing
Autoscaling
Serverless
Data Processing Techniques
Incremental processing
Column-oriented data stores
Data partitioning
Materialization
Process Efficiency
Data (Engineering) Democratization
Developer Experience
Collaboration
Conclusion
Conclusion


πŸ“œ SIMILAR VOLUMES


Understanding ETL: Data Pipelines for Mo
✍ Matt Palmer πŸ“‚ Library πŸ“… 2024 πŸ› O'Reilly Media, Inc. 🌐 English

Extract, transform, load (ETL) is at the center of every application of data, from business intelligence to AI. This technical guide offers data engineers, engineering managers, and architects an overview of the modern ETL process, along with the challenges you're likely to face and the strategic pa

Understanding ETL: Data Pipelines for Mo
✍ Matt Palmer πŸ“‚ Library πŸ“… 2023 πŸ› O’Reilly Media, Inc. 🌐 English

"Extract, transform, load" (ETL) is at the center of every application of data, from business intelligence to AI. Recent shifts in the data landscapeβ€”including the emergence of lakehouse architectures and the rising importance of high-scale real-time dataβ€”mean that today's data practitioners must ap

Modern Data Architectures with Python: A
✍ Brian Lipp πŸ“‚ Library πŸ“… 2023 πŸ› Packt Publishing Pvt Ltd 🌐 English

Build scalable and reliable data ecosystems using Data Mesh, Databricks Spark, and Kafka Key Features Develop modern data skills used in emerging technologies Learn pragmatic design methodologies such as Data Mesh and data lakehouses Gain a deeper understanding of data governance Purchase of

Modern Data Architectures with Python: A
✍ Brian Lipp πŸ“‚ Library πŸ“… 2023 πŸ› Packt Publishing 🌐 English

Learn to build scalable and reliable data ecosystems using Data Mesh, Databricks Spark, and Kafka. Key Features Develop modern data skills in emerging technologies Learn pragmatic design methodologies like Data Mesh and Lake House Grow a deeper understanding of data governance Book Descript

Understanding compression: data compress
✍ Haecky, Aleks;McAnlis, Colt πŸ“‚ Library πŸ“… 2016 πŸ› O'Reilly Media 🌐 English

If you want to attract and retain users in the booming mobile services market, you need a quick-loading app that won't churn through their data plans. The key is to compress multimedia and other data into smaller files, but finding the right method is tricky. This witty book helps you understand how