𝔖 Scriptorium
✦   LIBER   ✦

📁

Azure Data Engineering Cookbook: Get well versed in various data engineering techniques in Azure using this recipe-based guide, 2nd Edition

✍ Scribed by Nagaraj Venkatesan, Ahmad Osama


Publisher
Packt Publishing
Year
2022
Tongue
English
Leaves
608
Edition
2
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Nearly 80 recipes to help you collect and transform data from multiple sources into a single data source, making it way easier to perform analytics on the data

Key Features

  • Build data pipelines from scratch and find solutions to common data engineering problems
  • Learn how to work with Azure Data Factory, Data Lake, Databricks, and Synapse Analytics
  • Monitor and maintain your data engineering pipelines using Log Analytics, Azure Monitor, and Azure Purview

Book Description

The famous quote 'Data is the new oil' seems more true every day as the key to most organizations' long-term success lies in extracting insights from raw data. One of the major challenges organizations face in leveraging value out of data is building performant data engineering pipelines for data visualization, ingestion, storage, and processing. This second edition of the immensely successful book by Ahmad Osama brings to you several recent enhancements in Azure data engineering and shares approximately 80 useful recipes covering common scenarios in building data engineering pipelines in Microsoft Azure.

You'll explore recipes from Azure Synapse Analytics workspaces Gen 2 and get to grips with Synapse Spark pools, SQL Serverless pools, Synapse integration pipelines, and Synapse data flows. You'll also understand Synapse SQL Pool optimization techniques in this second edition. Besides Synapse enhancements, you'll discover helpful tips on managing Azure SQL Database and learn about security, high availability, and performance monitoring. Finally, the book takes you through overall data engineering pipeline management, focusing on monitoring using Log Analytics and tracking data lineage using Azure Purview.

By the end of this book, you'll be able to build superior data engineering pipelines along with having an invaluable go-to guide.

What you will learn

  • Process data using Azure Databricks and Azure Synapse Analytics
  • Perform data transformation using Azure Synapse data flows
  • Perform common administrative tasks in Azure SQL Database
  • Build effective Synapse SQL pools which can be consumed by Power BI
  • Monitor Synapse SQL and Spark pools using Log Analytics
  • Track data lineage using Microsoft Purview integration with pipelines

Who this book is for

This book is for data engineers, data architects, database administrators, and data professionals who want to get well versed with the Azure data services for building data pipelines. Basic understanding of cloud and data engineering concepts will help in getting the most out of this book.

Table of Contents

  1. Creating and Managing Data in Azure Data Lake
  2. Securing and Monitoring Data in Azure Data Lake
  3. Building Data Ingestion Pipelines Using Azure Data Factory
  4. Azure Data Factory Integration Runtime
  5. Configuring and Securing Azure SQL Database
  6. Implementing High Availability and Monitoring in Azure SQL Database
  7. Processing Data Using Azure Databricks
  8. Processing Data Using Azure Synapse Analytics
  9. Transforming Data Using Azure Synapse Dataflows
  10. Building the Serving Layer in Azure Synapse SQL Pool
  11. Monitoring Synapse SQL and Spark Pools
  12. Optimizing and Maintaining Synapse SQL and Spark Pools
  13. Monitoring and Maintaining Azure Data Engineering Pipelines

✦ Table of Contents


Cover
Title Page
Copyright
Contributors
Table of Contents
Preface
Chapter 1: Creating and Managing Data in Azure Data Lake
Technical requirements
Provisioning an Azure storage account using the Azure portal
Getting ready
How to do it…
How it works…
Provisioning an Azure storage account using PowerShell
Getting ready
How to do it…
How it works…
Creating containers and uploading files to Azure Blob storage using PowerShell
Getting ready
How to do it…
How it works…
Managing blobs in Azure Storage using PowerShell
Getting ready
How to do it…
How it works…
Configuring blob lifecycle management for blob objects using the Azure portal
Getting ready
How to do it…
How it works…
Chapter 2: Securing and Monitoring Data in Azure Data Lake
Configuring a firewall for an Azure Data Lake account using the Azure portal
Getting ready
How to do it…
How it works…
Configuring virtual networks for an Azure Data Lake account using the Azure portal
Getting ready
How to do it…
How it works…
Configuring private links for an Azure Data Lake account
Getting ready
How to do it…
How it works…
Configuring encryption using Azure Key Vault for Azure Data Lake
Getting ready
How to do it…
How it works…
Accessing Blob storage accounts using managed identities
Getting ready
How to do it…
How it works…
Creating an alert to monitor an Azure storage account
Getting ready
How to do it…
How it works…
Securing an Azure storage account with SAS using PowerShell
Getting ready
How to do it…
How it works…
Chapter 3: Building Data Ingestion Pipelines Using Azure Data Factory
Technical requirements
Provisioning Azure Data Factory
How to do it…
How it works…
Copying files to a database from a data lake using a control flow and copy activity
Getting ready
How to do it…
How it works…
Triggering a pipeline in Azure Data Factory
Getting ready
How to do it…
How it works…
Copying data from a SQL Server virtual machine to a data lake using the Copy data wizard
Getting ready
How to do it…
How it works…
Chapter 4: Azure Data Factory Integration Runtime
Technical requirements
Configuring a self-hosted IR
Getting ready
How to do it…
How it works…
Configuring a shared self-hosted IR
Getting ready
How to do it…
Configuring high availability for a self-hosted IR
Getting ready
How to do it…
How it works…
Patching a self-hosted IR
Getting ready
How to do it…
How it works…
Migrating an SSIS package to Azure Data Factory
Getting ready
How to do it…
How it works...
Chapter 5: Configuring and Securing Azure SQL Database
Technical requirements
Provisioning and connecting to an Azure SQL database using PowerShell
Getting ready
How to do it…
How it works…
Implementing an Azure SQL Database elastic pool using PowerShell
Getting ready
How to do it...
How it works…
Configuring a virtual network and private endpoints for Azure SQL Database
Getting ready
How to do it…
How it works…
Configuring Azure Key Vault for Azure SQL Database 
Getting ready
How to do it…
How it works…
Provisioning and configuring a wake-up script for a serverless SQL database
Getting ready
How to do it…
How it works…
Configuring the Hyperscale tier of Azure SQL Database
Getting ready
How to do it…
Chapter 6: Implementing High Availability and Monitoring in Azure SQL Database
Implementing active geo-replication for an Azure SQL database using PowerShell
Getting ready
How to do it…
How it works…
Implementing an auto-failover group for an Azure SQL database using PowerShell
Getting ready
How to do it…
How it works…
Configuring high availability to the Hyperscale tier of Azure SQL Database 
Getting ready
How to do it…
How it works…
Implementing vertical scaling for an Azure SQL database using PowerShell
Getting ready
How to do it…
How it works…
Monitoring an Azure SQL database using the Azure portal
Getting ready
How to do it…
Configuring auditing for Azure SQL Database
Getting ready
How to do it…
How it works…
Chapter 7: Processing Data Using Azure Databricks
Technical requirements
Configuring the Azure Databricks environment
Getting ready
How to do it…
Integrating Databricks with Azure Key Vault
Getting ready
How to do it…
How it works…
Mounting an Azure Data Lake container in Databricks
Getting ready
How to do it…
How it works…
Processing data using notebooks
Getting ready
How to do it…
How it works…
Scheduling notebooks using job clusters
Getting ready
How to do it…
How it works…
Working with Delta Lake tables
Getting ready
How to do it…
How it works…
Connecting a Databricks Delta Lake table to Power BI
Getting ready
How to do it…
How it works…
Chapter 8: Processing Data Using Azure Synapse Analytics
Technical requirements
Provisioning an Azure Synapse Analytics workspace
Getting ready
How to do it…
Analyzing data using serverless SQL pool
Getting ready
How to do it…
How it works…
Provisioning and configuring Spark pools
Getting ready
How to do it…
How it works…
Processing data using Spark pools and a lake database
Getting ready
How to do it…
How it works…
Querying the data in a lake database from serverless SQL pool
Getting ready
How to do it…
How it works…
Scheduling notebooks to process data incrementally
Getting ready
How to do it…
How it works…
Visualizing data using Power BI by connecting to serverless SQL pool
Getting ready
How to do it…
How it works…
Chapter 9: Transforming Data Using Azure Synapse Dataflows
Technical requirements
Copying data using a Synapse data flow
Getting ready
How to do it…
How it works…
Performing data transformation using activities such as join, sort, and filter
Getting ready
How to do it…
How it works…
Monitoring data flows and pipelines
Getting ready
How to do it…
How it works…
Configuring partitions to optimize data flows
Getting ready
How to do it…
How it works…
Parameterizing Synapse data flows
Getting ready
How to do it…
How it works…
Handling schema changes dynamically in data flows using schema drift
Getting ready
How to do it…
How it works…
Chapter 10: Building the Serving Layer in Azure Synapse SQL Pool
Technical requirements
Loading data into dedicated SQL pools using PolyBase and T-SQL
Getting ready
How to do it…
How it works...
Loading data into a dedicated SQL pool using COPY INTO
Getting ready
How to do it...
How it works...
Creating distributed tables and modifying table distribution
Getting ready
How to do it…
How it works...
Creating statistics and automating the update of statistics
Getting ready
How to do it…
How it works…
Creating partitions and archiving data using partitioned tables
Getting ready
How to do it…
How it works...
Implementing workload management in an Azure Synapse dedicated SQL pool
Getting ready
How to do it…
How it works…
Creating workload groups for advanced workload management
Getting ready
How to do it…
How it works...
Chapter 11: Monitoring Synapse SQL and Spark Pools
Technical requirements
Configuring a Log Analytics workspace for Synapse SQL pools
Getting ready
How to do it…
How it works…
Configuring a Log Analytics workspace for Synapse Spark pools
Getting ready
How to do it…
How it works…
Using Kusto queries to monitor SQL and Spark pools  
Getting ready
How to do it…
How it works…
Creating workbooks in a Log Analytics workspace to visualize monitoring data
Getting ready
How to do it…
How it works…
Monitoring table distribution, data skew, and index health using Synapse DMVs
Getting ready
How to do it…
Building monitoring dashboards for Synapse with Azure Monitor
Getting ready
How to do it…
How it works…
Chapter 12: Optimizing and Maintaining Synapse SQL and Spark Pools
Technical requirements
Analyzing a query plan and fixing table distribution
Getting ready
How to do it…
How it works…
Monitoring and rebuilding a replication table cache
Getting ready
How to do it…
How it works…
Configuring result set caching in Azure Synapse dedicated SQL pool
Getting ready
How to do it…
How it works…
Configuring longer backup retention for a Synapse SQL database
Getting ready
How to do it…
How it works…
Auto pausing Synapse dedicated SQL pool
Getting ready
How to do it…
How it works…
Optimizing Delta tables in a Synapse Spark pool lake database
Getting ready
How to do it…
How it works…
Optimizing query performance in Synapse Spark pools
Getting ready
How to do it…
How it works…
Chapter 13: Monitoring and Maintaining Azure Data Engineering Pipelines
Technical requirements
Monitoring Synapse integration pipelines using Log Analytics and workbooks
Getting ready
How to do it…
How it works…
Tracing SQL queries for dedicated SQL pool to Synapse integration pipelines
Getting ready
How to do it…
How it works…
Provisioning a Microsoft Purview account and creating a data catalog
Getting ready
How to do it…
How it works…
Integrating a Synapse workspace with Microsoft Purview and tracking data lineage
Getting ready
How to do it…
How it works…
Applying Azure tags using PowerShell to multiple Azure resources
Getting ready
How to do it…
How it works…
Index
About Packt
Other Books You May Enjoy


📜 SIMILAR VOLUMES


Azure Data Engineering Cookbook: Get wel
✍ Nagaraj Venkatesan, Ahmad Osama 📂 Library 📅 2022 🏛 Packt Publishing 🌐 English

<p><span>Nearly 80 recipes to help you collect and transform data from multiple sources into a single data source, making it way easier to perform analytics on the data</span></p><h4><span>Key Features</span></h4><ul><li><span><span>Build data pipelines from scratch and find solutions to common data

Azure Data Engineer Associate Certificat
✍ Giacinto Palmieri, Surendra Mettapalli, and Newton Alex 📂 Library 📅 2024 🏛 Packt Publishing Pvt. Ltd. 🌐 English

Achieve Azure Data Engineer Associate certification success with this DP-203 exam guide Purchase of this book unlocks access to web-based exam prep resources including mock exams, flashcards, and exam tips, and the eBook PDF Key Features Prepare for the DP-203 exam with expert insights, rea

Data Engineering on Azure
✍ Vlad Riscutia 📂 Library 📅 2021 🏛 Manning Publications 🌐 English

<b>Build a data platform to the industry-leading standards set by Microsoft's own infrastructure.</b> <p></p><b>Summary</b> In <i>Data Engineering on Azure</i> you will learn how to: <p></p> Pick the right Azure services for different data scenarios Manage data inventory Implement production q

Azure Data Engineering Cookbook: Design
✍ Ahmad Osama 📂 Library 📅 2021 🏛 Packt Publishing 🌐 English

<p><b>Over 90 recipes to help data scientists and AI engineers orchestrate modern ETL/ELT workflows and perform analytics using Azure services more easily</b></p><h4>Key Features</h4><ul><li>Discover how to work with different SQL and NoSQL data stores in Microsoft Azure</li><li>Create and execute r

Azure Data Engineer Associate Certificat
✍ Giacinto Palmieri, Surendra Mettapalli, Newton Alex 📂 Library 📅 2024 🏛 Packt Publishing - ebooks Account 🌐 English

<p><span>Achieve Azure Data Engineer Associate certification success with this DP-203 exam guide</span></p><p><span>Purchase of this book unlocks access to web-based exam prep resources including mock exams, flashcards, and exam tips, and the eBook PDF</span></p><p></p><h4><span>Key Features</span><