𝔖 Scriptorium
✦   LIBER   ✦

📁

Architecting a Modern Data Warehouse for Large Enterprises: Build Multi-cloud Modern Distributed Data Warehouses with Azure and AWS

✍ Scribed by Anjani Kumar, Abhishek Mishra, Sanjeev Kumar


Publisher
Apress
Year
2024
Tongue
English
Leaves
378
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Table of Contents


Table of Contents
About the Authors
About the Technical Reviewer
Acknowledgments
Chapter 1: Introduction
Objective
Origin of Data Processing and Storage in the Computer Era
Evolution of Databases and Codd Rules
Transitioning to the World of Data Warehouses
Data Warehouse Concepts
Data Sources (Data Format and Common Sources)
ETL (Extract, Transform, Load)
ETL and ELT
Data Mart
Data Mart Architecture
Advantages of Data Marts
Examples of Data Marts
Data Modeling
Tabular Modeling
Dimensional Modeling
Understanding Dimensional Modeling in Brief
Dimensions
Facts
Measures
Schematics Facts and Dimension Structuring
Cubes and Reporting
OLAP
Online Analytical Processing, Cubes, Reporting, and Data Mining
OLAP and Cubes
Categorization of OLAP
Querying Technique
Reporting Techniques
Data Mining
Metadata
Data Storage Techniques and Options
Evolution of Big Data Technologies and Data Lakes
Transition to the Modern Data Warehouse
Traditional Big Data Technologies
The Emergence of Data Lakes
The Benefits of Data Lakes
Data Lakes as Data Warehouses
Data Lake House and Data Mesh
Transformation and Optimization between New vs. Old (Evolution to Data Lake House)
A Wider Evolving Concept Called Data Mesh
Building an Effective Data Engineering Team
An Enterprise Scenario for Data Warehousing
Summary
Chapter 2: Modern Data Warehouses
Objectives
Introduction to Characteristics of Modern Data Warehouse
Data Velocity
Data Variety
Volume
Data Value
Fault Tolerance
Scalability
Interoperability
Reliability
Modern Data Warehouse Features: Distributed Processing, Storage, Streaming, and Processing Data in the Cloud
Distributed Processing
Flexibility and Speed in Implementation
Flexibility and Speed in Processing
Flexibility and Better Control on Costs
Storage
Storage as a Service
Storage Solutions
In-memory Storage
Streaming and Processing
Autonomous Administration Capabilities
Self-driving
Self-tuning and Configuration
Multi-tenancy and Security
Performance
Storage Efficiency
Scalable Storage
Reliability, Availability, and Serviceability (RAS):
Multiple Parallel Processing (MPP)
Flexibility and Speed in Implementation
Real-time Processing
Big Data
CAP Theorem
What Are NoSQL Databases?
Key–Value Pair Stores
Document Databases
Columnar DBs
Graph Databases
Case Study: Enterprise Scenario for Modern Cloud-based Data Warehouse
Advantages of Modern Data Warehouse over Traditional Data Warehouse
Summary
Chapter 3: Data Lake, Lake House, and Delta Lake
Structure
Objectives
Data Lake, Lake House, and Delta Lake Concepts
Data Lake, Storage, and Data Processing Engines Synergies and Dependencies
Implement Lake House in Azure
Create a Data Lake on Azure and Ingest the Health Data CSV File
Create an Azure Synapse Pipeline to Convert the CSV File to a Parquet File
Attach the Parquet File to the Lake Database
Implement Lake House in AWS
Create an S3 Bucket to Keep the Raw Data
Create an AWS Glue Job to Convert the Raw Data into a Delta Table
Query the Delta Table using the AWS Glue Job
Summary
Chapter 4: Data Mesh
Structure
Objectives
The Modern Data Problem and Data Mesh
Data Mesh Principles
Domain-driven Ownership
Data-as-a-Product
Self-Serve Data Platform
Federated Computational Governance
Design a Data Mesh on Azure
Create Data Products for the Domains
Create Data Product for Human Resources Domain
Create Data Product for Inventory Domain
Create Data Product for Procurement Domain
Create Data Product for Sales Domain
Create Data Product for Finance Domain
Create Self-Serve Data Platform
Data Mesh Experience Plane
Data Product Experience Plane
Infrastructure Plane
Federated Governance
Summary
Chapter 5: Data Orchestration Techniques
Structure
Objective
Data Orchestration Concepts
Modern Data Orchestration in Detail
Evolution of Data Orchestration
Data Orchestration Layers
Data Movement Optimization: OneLake Data and Its Impact on Modern Data Orchestration
A Strong Emphasis on Minimizing Data Duplicity
Data Integration
Middleware and ETL Tools
Enterprise Application Integration (EAI)
Service-Oriented Architecture (SOA)
Data Warehousing
Real-Time and Streaming Data Integration
Cloud-Based Data Integration
Data Integration for Big Data and NoSQL
Self-Service Data Integration
Use Cases
Data Pipelines
Data Processing using Data Pipelines
Batch Processing in Detail
Requirements:
Steps:
Real-time Processing in Detail
Benefits and Advantages of Data Pipelines
Common Use Cases for Data Pipelines
Data Governance Empowered by Data Orchestration: Enhancing Control and Compliance
Achieving Data Governance through Data Orchestration
Tools and Examples
Azure Data Factory
Azure Synapse
SQL and Spark Pools
Data Integration Features
Analytics and Power BI
Governance
Synapse Studio
Synapse Serverless
Azure Synapse and Its ETL Features
Azure Synapse Workspace:
Data Integration:
Data Flow:
Mapping Data Flows:
Wrangling Data Flows:
Data Movement:
Data Lake Integration:
Performance and Scalability:
Monitoring and Management:
AWS Glue
Snowflake and Its ETL Features
About Snowflake
Snowflake Architecture
Virtual Warehouse
Database and Schemas
Storage and Query Processing
Data Protection
Integration
Snowflake Support for ETL
Considerations for Building ETL Workflows on Snowflake
Continuous Data Loading in Snowflake
Snowpipe
Snowflake Connector for Kafka
Example and Use Case
Summary
Chapter 6: Data Democratization, Governance, and Security
Objectives
Introduction to Data Democratization
Factors Driving Data Democratization
Layers of Democratization Architecture
Platform Architecture — Technology Component
Team Architecture — People Component
Shared Architecture — Processes Component
Self-Service
Data Catalog and Data Sharing
Types of Metadata
Classes of Metadata
People
Tools and Technology: Self-Service Tools
Data Governance Tools
Data Discovery and Management
D&A (Data and Analytics) Platform Governance
Analytics Platform Governance
Capabilities Covered by Tools
Introduction to Data Governance
Ten Key Factors that Ensure Successful Data Governance
Data Stewardship
Models of Data Stewardship
Model 1: Data Steward by Subject Area
Model 2: Data Stewardship by Function
Model 3: Data Steward by Business Process
Model 4: Data Steward by System
Model 5: Data Steward by Project
Data Security Management
Security Layers
Human Layer
Physical Perimeter Point/Layer
Network Layer
Endpoint Layer/Protection
Application Layer
Data Layer
Mission-Critical Assets
Data Security Approach
Types of Controls
Major Categories for Security Controls
Data Security in Outsourcing Mode
Guiding Principles
Popular Information Security Frameworks
Major Privacy and Security Regulations
Major Modern Security Management Concepts
Centralized Enterprise Key Management
Data Protection Cloud Gateways
Secure Instant Communications
Data Classification
TLS Encryption and Decryption
Data Security as a Service
Ideal Scenarios
Practical Use Case for Data Governance and Data Democratization
Problem Statement
Motivation
Business Drivers
Technology: Analytics Platform
Efforts/Processes Improvement
High-Level Proposed Solution
Tools
Cost
Summary
Chapter 7: Business Intelligence
Structure
Objectives
Introduction to Business Intelligence
Descriptive Reports
Predictive Reports
Prescriptive Reports
Business Intelligence Tools
Query and Reporting Tools
Online Analytical Processing (OLAP) Tools
Analytical Applications
Performance Management Tools
Predictive Analytics and Data Mining Tools
Advanced Visualization and Discovery Tools
Trends in Business Intelligence (BI)
Business Decision Intelligence Analysis
Self-Service
Advanced BI Analytics
Data Literacy
Analytics Governance
Data Analytics Life Cycle
BI and Data Science Together
Data Strategy
Data and Analytics Approach and Strategy
Core Strategy of Business
Mappings
Step to Create Data Strategy
Summary
Index


📜 SIMILAR VOLUMES


Architecting a Modern Data Warehouse for
✍ Anjani Kumar; Abhishek Mishra; Sanjeev Kumar 📂 Library 📅 2023 🏛 Apress 🌐 English

The book starts by covering foundational data warehouse concepts, and introduces modern features such as distributed processing, big data storage, data streaming, and processing data on the cloud. You will gain an understanding of the synergy, relevance, and usage data warehousing standard practices

Deciphering Data Architectures: Choosing
✍ James Serra 📂 Library 📅 2023 🏛 O'Reilly Media 🌐 English

<p>Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they're also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of each architecture to h

Deciphering Data Architectures: Choosing
✍ James Serra 📂 Library 📅 2024 🏛 O'Reilly Media 🌐 English

Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they're also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of each architecture to help

Deciphering Data Architectures: Choosing
✍ James Serra 📂 Library 📅 2024 🏛 O'Reilly Media 🌐 English

Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they're also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of each architecture to help

Modern Data Architectures with Python: A
✍ Brian Lipp 📂 Library 📅 2023 🏛 Packt Publishing 🌐 English

Learn to build scalable and reliable data ecosystems using Data Mesh, Databricks Spark, and Kafka. Key Features Develop modern data skills in emerging technologies Learn pragmatic design methodologies like Data Mesh and Lake House Grow a deeper understanding of data governance Book Descript

Modern Data Architectures with Python: A
✍ Brian Lipp 📂 Library 📅 2023 🏛 Packt Publishing Pvt Ltd 🌐 English

Build scalable and reliable data ecosystems using Data Mesh, Databricks Spark, and Kafka Key Features Develop modern data skills used in emerging technologies Learn pragmatic design methodologies such as Data Mesh and data lakehouses Gain a deeper understanding of data governance Purchase of