Beginning user level
Beginning Azure Synapse Analytics: Transition from Data Warehouse to Data Lakehouse
â Scribed by Bhadresh Shiyal
- Publisher
- Apress
- Year
- 2021
- Tongue
- English
- Leaves
- 263
- Edition
- 1
- Category
- Library
No coin nor oath required. For personal study only.
⌠Synopsis
Beginning user level
⌠Table of Contents
Table of Contents
About the Author
About the Technical Reviewer
Acknowledgments
Introduction
Chapter 1: Core Data and Analytics Concepts
Core Data Concepts
What Is Data?
Structured Data
Semi-structured Data
Unstructured Data
Data Processing Methods
Batch Data Processing
Streaming or Real-Time Data Processing
Relational Data and Its Characteristics
Non-Relational Data and Its Characteristics
Core Data Analytics Concepts
What Is Data Analytics?
Data Ingestion
Data Exploration
Data Processing
ETL
ELT
ELT / ETL Tools
Data Visualization
Data Analytics Categories
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics
Cognitive Analytics
Summary
Chapter 2: Modern Data Warehouses and Data Lakehouses
What Is a Data Warehouse?
Core Data Warehouse Concepts
Data Model
Model Types
Schema Types
Metadata
Why Do We Need a Data Warehouse?
Efficient Decision-Making
Separation of Concerns
Single Version of the Truth
Data Restructuring
Self-Service BI
Historical Data
Security
Data Quality
Data Mining
More Revenues
What Is a Modern Data Warehouse?
Difference Between Traditional & Modern Data Warehouses
Cloud vs. On-Premises
Separation of Compute and Storage Resources
Cost
Scalability
ETL vs. ELT
Disaster Recovery
Overall Architecture
Data Lakehouse
What Is a Data Lake?
What Is Delta Lake?
What Is Apache Spark?
What Is a Data Lakehouse?
Characteristics of a Data Lakehouse
Various Data Types
AI
Decoupled Compute and Storage Resources
Open Source Storage Format
Data Analytics and BI Tools
ACID Properties
Differences Between a Data Warehouse and a Data Lakehouse
Architecture
Access to Raw Data
Open Source vs. Proprietary
Workloads
Query Engines
Data Processing
Real-Time Data
Examples of Data Lakehouses
Azure Synapse Analytics
Databricks
Benefits of Data Lakehouse
Support for All Types of Data
Time to Market
More Cost Effective
AI
Reduction in ETL/ELT Jobs
Usage of Open Source Tools and Technologies
Efficient and Easy Data Governance
Drawbacks of Data Lakehouse
Monolithic Architecture
Technical Infancy
Migration Cost
Lack of Many Products/Options
Scarcity of Skilled Technical Resources
Summary
Chapter 3: Introduction to Azure Synapse Analytics
What Is Azure Synapse Analytics?
Azure Synapse Analytics vs. Azure SQL Data Warehouse
Why Should You Learn Azure Synapse Analytics?
Main Features of Azure Synapse Analytics
Unified Data Analytics Experience
Powerful Data Insights
Unlimited Scale
Security, Privacy, and Compliance
HTAP
Key Service Capabilities of Azure Synapse Analytics
Data Lake Exploration
Multiple Language Support
Deeply Integrated Apache Spark
Serverless Synapse SQL Pool
Hybrid Data Integration
Power BI Integration
AI Integration
Enterprise Data Warehousing
Seamless Streaming Analytics
Workload Management
Advanced Security
Summary
Chapter 4: Architecture and Its Main Components
High-Level Architecture
Main Components of Architecture
Synapse SQL
Compute Layer
Dedicated Synapse SQL Pool
Serverless Synapse SQL Pool
Storage Layer
Synapse Spark or Apache Spark
Synapse Pipelines
Synapse Studio
Synapse Link
Summary
Chapter 5: Synapse SQL
Synapse SQL Architecture Components
Massively Parallel Processing Engine
Distributed Query Processing Engine
Control Node
Compute Nodes
Data Movement Service
Distribution
Hash Distribution
Round-Robin Distribution
Replication-based Distribution
Azure Storage
Dedicated or Provisioned Synapse SQL Pool
Serverless or On-Demand Synapse SQL Pool
Synapse SQL Feature Comparison
Database Object Types
Query Language
Security
Tools
Storage Options
Data Formats
Resource Consumption Model for Synapse SQL
Synapse SQL Best Practices
Best Practices for Serverless Synapse SQL Pool
Best Practices for Dedicated Synapse SQL Pool
How-Toâs
Create a Dedicated Synapse SQL Pool
Create a Serverless or On-Demand Synapse SQL Pool
Load Data Using COPY Statement in Dedicated Synapse SQL Pool
Ingest Data into Azure Data Lake Storage Gen2
Summary
Chapter 6: Synapse Spark
What Is Apache Spark?
What Is Synapse Spark in Azure Synapse Analytics?
Synapse Spark Features & Capabilities
Speed
Faster Start Time
Ease of Creation
Ease of Use
Security
Automatic Scalability
Separation of Concerns
Multiple Language Support
Integration with IDEs
Pre-loaded Libraries
REST APIs
Delta Lake and Its Importance in Synapse Spark
Synapse Spark Job Optimization
Data Format
Memory Management
Data Serialization
Data Caching
Data Abstraction
Join and Shuffle Optimization
Bucketing
Hyperspace Indexing
Synapse Spark Machine Learning
Data Preparation and Exploration
Build Machine Learning Models
Train Machine Learning Models
Model Deployment and Scoring
How-Toâs
How to Create a Synapse Spark Pool
How to Create and Submit Apache Spark Job Definition in Synapse Studio Using Python
How to Monitor Synapse Spark Pools Using Synapse Studio
Summary
Chapter 7: Synapse Pipelines
Overview of Azure Data Factory
Overview of Synapse Pipelines
Activities
Pipelines
Linked Services
Dataset
Integration Runtimes (IR)
Azure Integration Runtime (Azure IR)
Self-Hosted Integration Runtimes (SHIR)
Azure SSIS Integration Runtimes (Azure SSIS IR)
Control Flow
Parameters
Data Flow
Data Movement Activities
Category: Azure
Category: Database
Category: NoSQL
Category: File
Category: Generic
Category: Services and Applications
Data Transformation Activities
Control Flow Activities
Copy Pipeline Example
Transformation Pipeline Example
Pipeline Triggers
Summary
Chapter 8: Synapse Workspace and Studio
What Is a Synapse Analytics Workspace?
Synapse Analytics Workspace Components and Features
Azure Data Lake Storage Gen2 Account and File System
Serverless Synapse SQL Pool
Shared Metadata Management
Code Artifacts
What Is Synapse Studio?
Main Features of Synapse Studio
Home Hub
Data Hub
Develop Hub
Integrate Hub
Monitor Hub
Integration
Activities
Manage Hub
Analytics Pools
External Connections
Integration
Security
Synapse Studio Capabilities
Data Preparation
Data Management
Data Exploration
Data Warehousing
Data Visualization
Machine Learning
Power BI in Synapse Studio
How-Toâs
How to Create or Provision a New Azure Synapse Analytics Workspace Using Azure Portal
How to Launch Azure Synapse Studio
How to Link Power BI with Azure Synapse Studio
Summary
Chapter 9: Synapse Link
OLTP vs. OLAP
What Is HTAP?
Benefits of HTAP
No-ETL Analytics
Instant Insights
Reduced Data Duplication
Simplified Technical Architecture
What Is Azure Synapse Link?
Azure Cosmos DB
Azure Cosmos DB Analytical Store
Columnar Storage
Decoupling of Operational Store
Automatic Data Synchronization
SQL API and MongoDB API
Analytical TTL
Automatic Schema Updates
Cost-Effective Archiving
Scalability
When to Use Azure Synapse Link for Cosmos DB
Azure Synapse Link Limitations
Azure Synapse Link Use Cases
Industrial IOT
Predictive Maintenance Pipeline
Operational Reporting
Real-Time Applications
Real-Time Personalization for E-Commerce Users
How-Toâs
How to Enable Azure Synapse Link for Azure Cosmos DB
How to Create an Azure Cosmos DB Container with Analytical Store Using Azure Portal
How to Connect to Azure Synapse Link for Azure Cosmos DB Using Azure Portal
Summary
Chapter 10: Azure Synapse Analytics Use Cases and Reference Architecture
Where Should You Use Azure Synapse Analytics?
Large Volume of Data
Disparate Sources of Data
Data Transformation
Batch or Streaming Data
Where Should You Not Use Azure Synapse Analytics?
Use Cases for Azure Synapse Analytics
Financial Services
Manufacturing
Retail
Healthcare
Reference Architectures for Azure Synapse Analytics
Modern Data Warehouse Architecture
Real-Time Analytics on Big Data Architecture
Summary
Index
đ SIMILAR VOLUMES
<p><span>Design and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakeh
<p><span>Design and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakeh
<span>Design and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakehous
<p>Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they're also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of each architecture to h
Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they're also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of each architecture to help