Extract, transform, load (ETL) is at the center of every application of data, from business intelligence to AI. This technical guide offers data engineers, engineering managers, and architects an overview of the modern ETL process, along with the challenges you're likely to face and the strategic pa
Understanding ETL: Data Pipelines for Modern Data Architectures (Early Release)
β Scribed by Matt Palmer
- Publisher
- OβReilly Media, Inc.
- Year
- 2023
- Tongue
- English
- Leaves
- 39
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
"Extract, transform, load" (ETL) is at the center of every application of data, from business intelligence to AI. Recent shifts in the data landscapeβincluding the emergence of lakehouse architectures and the rising importance of high-scale real-time dataβmean that today's data practitioners must approach ETL a bit differently.
This technical guide offers data engineers, engineering managers, and architects an overview of the modern ETL process, along with the challenges you're likely to face and the strategic patterns that will help you overcome them. You will be equipped to make informed decisions when implementing ETL and choose the technology stack that will help you succeed.
β¦ Table of Contents
Preface
The Bread and Butter of Data Engineering
The Brave New World of AI
A Changing Data Landscape
What About ELT (and Other Flavors)?
1. Data Ingestion
Data Ingestion : Now vs. Then
Sources and Targets
The Source
Examining sources
Source Checklist
The Destination
Examining Destinations
Staging Ingested Data
Change Data Capture (CDC)
Destination Checklist
Ingestion Considerations
Frequency
Batch
Micro Batch
Streaming
Methods
Message Services
Stream Processing Engines
Simplifying Stream Processing
Payload
Volume
Structure and Shape
Unstructured
Semi-structured
Structured
Format
Variety
Choosing a Solution
Declarative Solutions
Cost to build/maintain
Extensibility
Cost to switch
Imperative Solutions
Extensibility
Cost to build/maintain
Cost to switch
Hybrid Solutions
Data Ingestion Checklist
π SIMILAR VOLUMES
Extract, transform, load (ETL) is at the center of every application of data, from business intelligence to AI. This technical guide offers data engineers, engineering managers, and architects an overview of the modern ETL process, along with the challenges you're likely to face and the strategic pa
If you haven't modernized your data cleaning and reporting processes in Microsoft Excel, you're missing out on big productivity gains. And if you're looking to conduct rigorous data analysis, more can be done in Excel than you think. This practical book serves as an introduction to the modern Excel
Data management is an emerging and disruptive subject. Datafication is everywhere. This transformation is happening all around us: in smartphones, TV devices, ereaders, industrial machines, self-driving cars, robots, and so on. Itβs changing our lives at an accelerating speed. As data management
Data science happens in code. The ability to write reproducible, robust, scaleable code is key to a data science project's successβand is absolutely essential for those working with production code. This practical book bridges the gap between data science and software engineering, clearly explaining
Build scalable and reliable data ecosystems using Data Mesh, Databricks Spark, and Kafka Key Features Develop modern data skills used in emerging technologies Learn pragmatic design methodologies such as Data Mesh and data lakehouses Gain a deeper understanding of data governance Purchase of