๐”– Scriptorium
โœฆ   LIBER   โœฆ

๐Ÿ“

Deciphering Data Architectures: Choosing Between a Modern Data Warehouse, Data Fabric, Data Lakehouse, and Data Mesh

โœ Scribed by James Serra


Publisher
O'Reilly Media
Year
2023
Tongue
English
Leaves
225
Category
Library

โฌ‡  Acquire This Volume

No coin nor oath required. For personal study only.

โœฆ Synopsis


Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they're also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of each architecture to help data professionals understand its pros and cons.

In the process, James Serra, big data and data warehousing solution architect at Microsoft, examines common data architecture concepts, including how data warehouses have had to evolve to work with data lake features. You'll learn what data lakehouses can help you achieve, and how to distinguish data mesh hype from reality. Best of all, you'll be able to determine the most appropriate data architecture for your needs. By reading this book, you'll:

  • Gain a working understanding of several data architectures
  • Know the pros and cons of each approach
  • Distinguish data architecture theory from the...
  • โœฆ Table of Contents


    Foreword
    Preface
    Conventions Used in This Book
    Oโ€™Reilly Online Learning
    How to Contact Us
    Acknowledgments
    I. Foundation
    1. Big Data
    What Is Big Data, and How Can It Help You?
    Data Maturity
    Stage 1: Reactive
    Stage 2: Informative
    Stage 3: Predictive
    Stage 4: Transformative
    Self-Service Business Intelligence
    Summary
    2. Types of Data Architectures
    Evolution of Data Architectures
    Relational Data Warehouse
    Data Lake
    Modern Data Warehouse
    Data Fabric
    Data Lakehouse
    Data Mesh
    Summary
    3. The Architecture Design Session
    What Is an ADS?
    Why Hold an ADS?
    Before the ADS
    Preparing
    Inviting Participants
    Conducting the ADS
    Introductions
    Discovery
    Whiteboarding
    After the ADS
    Tips for Conducting an ADS
    Summary
    II. Common Data Architecture Concepts
    4. The Relational Data Warehouse
    What Is a Relational Data Warehouse?
    What a Data Warehouse Is Not
    The Top-Down Approach
    Why Use a Relational Data Warehouse?
    Drawbacks to Using a Relational Data Warehouse
    Populating a Data Warehouse
    How Often to Extract the Data
    Extraction Methods
    How to Determine What Data Has Changed Since the Last Extraction
    The Death of the Relational Data Warehouse Has Been Greatly Exaggerated
    Summary
    5. Data Lake
    What Is a Data Lake?
    Why Use a Data Lake?
    Bottom-Up Approach
    Best Practices for Data Lake Design
    Multiple Data Lakes
    Advantages
    Organizational structure and ownership
    Compliance, governance, and security
    Cloud subscription, service limits, and policies
    Performance, availability, and disaster recovery
    Data retention and environment management
    Disadvantages
    Summary
    6. Data Storage Solutions and Processes
    Data Storage Solutions
    Data Marts
    Operational Data Stores
    Use case
    Data Hubs
    Data Processes
    Master Data Management
    Use case
    Data Virtualization and Data Federation
    Virtualization as a replacement for the data warehouse
    Virtualization as a replacement for ETL or data movement
    Data Catalogs
    Data Marketplaces
    Summary
    7. Approaches to Design
    Online Transaction Processing Versus Online Analytical Processing
    Operational and Analytical Data
    Symmetric Multiprocessing and Massively Parallel Processing
    Lambda Architecture
    Kappa Architecture
    Polyglot Persistence and Polyglot Data Stores
    Summary
    8. Approaches to Data Modeling
    Relational Modeling
    Keys
    Entityโ€“Relationship Diagrams
    Normalization Rules and Forms
    Tracking Changes
    Dimensional Modeling
    Facts, Dimensions, and Keys
    Tracking Changes
    Denormalization
    Common Data Model
    Data Vault
    The Kimball and Inmon Data Warehousing Methodologies
    Inmonโ€™s Top-Down Methodology
    Kimballโ€™s Bottom-Up Methodology
    Choosing a Methodology
    Hybrid Models
    Methodology Myths
    Summary
    9. Approaches to Data Ingestion
    ETL Versus ELT
    Reverse ETL
    Batch Processing Versus Real-Time Processing
    Batch Processing Pros and Cons
    Real-Time Processing Pros and Cons
    Data Governance
    Summary
    III. Data Architectures
    10. The Modern Data Warehouse
    The MDW Architecture
    Pros and Cons of the MDW Architecture
    Combining the RDW and Data Lake
    Data Lake
    Relational Data Warehouse
    Stepping Stones to the MDW
    EDW Augmentation
    How it works
    Benefits
    Challenges
    Migration
    Temporary Data Lake Plus EDW
    How it works
    Benefits
    Challenges
    Migration
    All-in-One
    How it works
    Benefits
    Challenges
    Migration
    Case Study: Wilson & Gunkerkโ€™s Strategic Shift to an MDW
    Challenge
    Solution
    Outcome
    Summary
    11. Data Fabric
    The Data Fabric Architecture
    Data Access Policies
    Metadata Catalog
    Master Data Management
    Data Virtualization
    Real-Time Processing
    APIs
    Services
    Products
    Why Transition from an MDW to a Data Fabric Architecture?
    Potential Drawbacks
    Summary
    12. Data Lakehouse
    Delta Lake Features
    Performance Improvements
    The Data Lakehouse Architecture
    What If You Skip the Relational Data Warehouse?
    Relational Serving Layer
    Summary
    13. Data Mesh Foundation
    A Decentralized Data Architecture
    Data Mesh Hype
    Dehghaniโ€™s Four Principles of Data Mesh
    Principle #1: Domain Ownership
    Principle #2: Data as a Product
    Principle #3: Self-Serve Data Infrastructure as a Platform
    Principle #4: Federated Computational Governance
    The โ€œPureโ€ Data Mesh
    Data Domains
    Data Mesh Logical Architecture
    Different Topologies
    Data Mesh Versus Data Fabric
    Use Cases
    Summary
    14. Should You Adopt Data Mesh? Myths, Concerns, and the Future
    Myths
    Myth: Using Data Mesh Is a Silver Bullet That Solves All Data Challenges Quickly
    Myth: A Data Mesh Will Replace Your Data Lake and Data Warehouse
    Myth: Data Warehouse Projects Are All Failing, and a Data Mesh Will Solve That Problem
    Myth: Building a Data Mesh Means Decentralizing Absolutely Everything
    Myth: You Can Use Data Virtualization to Create a Data Mesh
    Concerns
    Philosophical and Conceptual Matters
    Combining Data in a Decentralized Environment
    Other Issues of Decentralization
    Complexity
    Duplication
    Feasibility
    People
    Domain-Level Barriers
    Organizational Assessment: Should You Adopt a Data Mesh?
    Recommendations for Implementing a Successful Data Mesh
    The Future of Data Mesh
    Zooming Out: Understanding Data Architectures and Their Applications
    Summary
    IV. People, Processes, and Technology
    15. People and Processes
    Team Organization: Roles and Responsibilities
    Roles for MDW, Data Fabric, or Data Lakehouse
    Roles for Data Mesh
    Domain teams
    Self-service data infrastructure platform team
    Federated computational governance platform team
    Why Projects Fail: Pitfalls and Prevention
    Pitfall: Allowing Executives to Think That BI Is โ€œEasyโ€
    Pitfall: Using the Wrong Technologies
    Pitfall: Gathering Too Many Business Requirements
    Pitfall: Gathering Too Few Business Requirements
    Pitfall: Presenting Reports Without Validating Their Contents First
    Pitfall: Hiring an Inexperienced Consulting Company
    Pitfall: Hiring a Consulting Company That Outsources Development to Offshore Workers
    Pitfall: Passing Project Ownership Off to Consultants
    Pitfall: Neglecting the Need to Transfer Knowledge Back into the Organization
    Pitfall: Slashing the Budget Midway Through the Project
    Pitfall: Starting with an End Date and Working Backward
    Pitfall: Structuring the Data Warehouse to Reflect the Source Data Rather Than the Businessโ€™s Needs
    Pitfall: Presenting End Users with a Solution with Slow Response Times or Other Performance Issues
    Pitfall: Overdesigning (or Underdesigning) Your Data Architecture
    Pitfall: Poor Communication Between IT and the Business Domains
    Tips for Success
    Donโ€™t Skimp on Your Investment
    Involve Users, Show Them Results, and Get Them Excited
    Add Value to New Reports and Dashboards
    Ask End Users to Build a Prototype
    Find a Project Champion/Sponsor
    Make a Project Plan That Aims for 80% Efficiency
    Summary
    16. Technologies
    Choosing a Platform
    Open Source Solutions
    On-Premises Solutions
    Cloud Provider Solutions
    Cloud Service Models
    Major Cloud Providers
    Multi-Cloud Solutions
    Software Frameworks
    Hadoop
    Databricks
    Snowflake
    Summary
    Index


    ๐Ÿ“œ SIMILAR VOLUMES


    Deciphering Data Architectures: Choosing
    โœ James Serra ๐Ÿ“‚ Library ๐Ÿ“… 2024 ๐Ÿ› O'Reilly Media ๐ŸŒ English

    Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they're also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of each architecture to help

    Deciphering Data Architectures: Choosing
    โœ James Serra ๐Ÿ“‚ Library ๐Ÿ“… 2024 ๐Ÿ› O'Reilly Media ๐ŸŒ English

    Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they're also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of each architecture to help

    Data Management at Scale: Modern Data Ar
    โœ Piethein Strengholt ๐Ÿ“‚ Library ๐Ÿ“… 2023 ๐Ÿ› O'Reilly Media ๐ŸŒ English

    As data management continues to evolve rapidly, managing all of your data in a central place, such as a data warehouse, is no longer scalable. Today's world is about quickly turning data into value. This requires a paradigm shift in the way we federate responsibilities, manage data, and make it avai

    Dada Data
    โœ Sarah Hegenbart;Mara-Johanna Klmel; ๐Ÿ“‚ Library ๐Ÿ“… 2022 ๐Ÿ› Bloomsbury UK ๐ŸŒ English