Extract, transform, load (ETL) is at the center of every application of data, from business intelligence to AI. This technical guide offers data engineers, engineering managers, and architects an overview of the modern ETL process, along with the challenges you're likely to face and the strategic pa
Understanding ETL: Data Pipelines for Modern Data Architectures by Matt Palmer
β Scribed by Matt Palmer
- Publisher
- O'Reilly Media, Inc.
- Year
- 2024
- Tongue
- English
- Leaves
- 97
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Extract, transform, load (ETL) is at the center of every application of data, from business intelligence to AI. This technical guide offers data engineers, engineering managers, and architects an overview of the modern ETL process, along with the challenges you're likely to face and the strategic patterns that will help you overcome them. You will be equipped to make informed decisions when implementing ETL and choose the technology stack that will help you succeed.
β¦ Table of Contents
Introduction
The Brave New World of AI
A Changing Data Landscape
What About ELT (and Other Flavors)?
OβReilly Online Learning
How to Contact Us
Acknowledgments
1. Data Ingestion
Data IngestionβNow Versus Then
Sources and Targets
The Source
Examining sources
Questions to ask
Source checklist
The Destination
Examining destinations
Staging ingested data
Change data capture
Destination checklist
Ingestion Considerations
Frequency
Batch
Micro-batch
Streaming
Methods
Message services
Stream processing engines
Simplifying stream processing
Payload
Volume
Structure and shape
Unstructured
Semi-structured
Structured
Format
Variety
Choosing a Solution
Declarative Solutions
Cost to build/maintain
Extensibility
Cost to switch
Imperative Solutions
Extensibility
Cost to build/maintain
Cost to switch
Hybrid Solutions
2. Data Transformation
What Is Data Transformation?
Where Are We Now?
How Do We Transform Data?
Environments
Data staging
Languages
Frameworks
Other approaches
Building a Transformation Solution
Data Transformation Patterns
Data Update Patterns
Best Practices
Real-Time Data Transformation
The Future of Data Transformation
3. Data Orchestration
What Is Data Orchestration?
Why Orchestrate?
The DAG
Data Orchestration Tools
Choosing an Orchestrator
Characteristics
Orchestrator options
Orchestrating SQL
Design Patterns and Best Practices
The Future of Data Orchestration
4. Pipeline Issues and Troubleshooting
Maintainability
Monitoring and Benchmarking
Metrics
Methods
Errors
Error Handling
Recovery
Improving Workflows
Start with Relationships
Align Incentives
Improve Outcomes
5. Efficiency and Scalability
Efficiency and Scalability Defined
Understand Your Environment
Frameworks
Resource Allocation
Parallelization and concurrency
Clusters
Spot versus on-demand cluster instances
Pooling
Cluster sharing
Autoscaling
Serverless
Data Processing Techniques
Incremental processing
Column-oriented data stores
Data partitioning
Materialization
Process Efficiency
Data (Engineering) Democratization
Developer Experience
Collaboration
Conclusion
Conclusion
π SIMILAR VOLUMES
"Extract, transform, load" (ETL) is at the center of every application of data, from business intelligence to AI. Recent shifts in the data landscapeβincluding the emergence of lakehouse architectures and the rising importance of high-scale real-time dataβmean that today's data practitioners must ap
Build scalable and reliable data ecosystems using Data Mesh, Databricks Spark, and Kafka Key Features Develop modern data skills used in emerging technologies Learn pragmatic design methodologies such as Data Mesh and data lakehouses Gain a deeper understanding of data governance Purchase of
Learn to build scalable and reliable data ecosystems using Data Mesh, Databricks Spark, and Kafka. Key Features Develop modern data skills in emerging technologies Learn pragmatic design methodologies like Data Mesh and Lake House Grow a deeper understanding of data governance Book Descript
If you want to attract and retain users in the booming mobile services market, you need a quick-loading app that won't churn through their data plans. The key is to compress multimedia and other data into smaller files, but finding the right method is tricky. This witty book helps you understand how