𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud

✍ Scribed by Robert Ilijason


Publisher
Apress
Year
2020
Tongue
English
Leaves
281
Edition
1st ed.
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what classical analytics solutions cost, while at the same time getting the results you need, incrementally faster.

This book explains how the confluence of these pivotal technologies gives you enormous power, and cheaply, when it comes to huge datasets. You will begin by learning how cloud infrastructure makes it possible to scale your code to large amounts of processing units, without having to pay for the machinery in advance. From there you will learn how Apache Spark, an open source framework, can enable all those CPUs for data analytics use. Finally, you will see how services such as Databricks provide the power of Apache Spark, without you having to know anything about configuring hardware or software. By removing the need for expensive experts and hardware, your resources can instead be allocated to actually finding business value in the data.

This book guides you through some advanced topics such as analytics in the cloud, data lakes, data ingestion, architecture, machine learning, and tools, including Apache Spark, Apache Hadoop, Apache Hive, Python, and SQL. Valuable exercises help reinforce what you have learned.


What You Will Learn

  • Discover the value of big data analytics that leverage the power of the cloud
  • Get started with Databricks using SQL and Python in either Microsoft Azure or AWS
  • Understand the underlying technology, and how the cloud and Apache Spark fit into the bigger picture
  • See how these tools are used in the real world
  • Run basic analytics, including machine learning, on billions of rows at a fraction of a cost or free


Who This Book Is For

Data engineers, data scientists, and cloud architects who want or need to run advanced analytics in the cloud. It is assumed that the reader has data experience, but perhaps minimal exposure to Apache Spark and Azure Databricks. The book is also recommended for people who want to get started in the analytics field, as it provides a strong foundation.

✦ Table of Contents


Front Matter ....Pages i-xvii
Introduction to Large-Scale Data Analytics (Robert Ilijason)....Pages 1-14
Spark and Databricks (Robert Ilijason)....Pages 15-25
Getting Started with Databricks (Robert Ilijason)....Pages 27-38
Workspaces, Clusters, and Notebooks (Robert Ilijason)....Pages 39-49
Getting Data into Databricks (Robert Ilijason)....Pages 51-73
Querying Data Using SQL (Robert Ilijason)....Pages 75-102
The Power of Python (Robert Ilijason)....Pages 103-137
ETL and Advanced Data Wrangling (Robert Ilijason)....Pages 139-175
Connecting to and from Databricks (Robert Ilijason)....Pages 177-199
Running in Production (Robert Ilijason)....Pages 201-226
Bits and Pieces (Robert Ilijason)....Pages 227-267
Back Matter ....Pages 269-274

✦ Subjects


Business and Management; Big Data/Analytics; Microsoft and .NET; Open Source


πŸ“œ SIMILAR VOLUMES


Beginning Apache Spark Using Azure Datab
✍ Robert Ilijason πŸ“‚ Library πŸ“… 2020 πŸ› Apress 🌐 English

<p>Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere frac

Beginning Apache Spark Using Azure Datab
✍ Robert Ilijason πŸ“‚ Library πŸ“… 2020 πŸ› Apress 🌐 English

<p>Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere frac

Databricks. Using Apache Spark
πŸ“‚ Library 🌐 English

ΠœΠ°Π½ΡƒΠ°Π» ΠΎΡ‚ ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΈ Databricks ΠΏΠΎ использованию Apache Spark.<div class="bb-sep"></div><strong>Introduction</strong><br/><strong>Log Analysis with Spark</strong><br/>Introduction to Apache Spark<br/>Importing Data<br/>Exporting Data<br/>Log Analyzer Application<br/><strong>Twitter Streaming Language

Azure Databricks Cookbook: Accelerate an
✍ Phani Raj, Vinod Jaiswal πŸ“‚ Library πŸ“… 2021 πŸ› Packt Publishing 🌐 English

<p><b>Get to grips with building and productionizing end-to-end big data solutions in Azure and learn best practices for working with large datasets</b></p><h4>Key Features</h4><ul><li>Integrate with Azure Synapse Analytics, Cosmos DB, and Azure HDInsight Kafka Cluster to scale and analyze your proj

Azure Databricks Cookbook: Accelerate an
✍ Phani Raj, Vinod Jaiswal πŸ“‚ Library πŸ› Packt Publishing 🌐 English

<p><span>Get to grips with building and productionizing end-to-end big data solutions in Azure and learn best practices for working with large datasets</span></p><h4><span>Key Features</span></h4><ul><li><span><span>Integrate with Azure Synapse Analytics, Cosmos DB, and Azure HDInsight Kafka Cluster

Data Engineering with Databricks Cookboo
✍ Pulkit Chadha πŸ“‚ Library πŸ“… 2024 πŸ› Packt Publishing 🌐 English

Work through 70 recipes for implementing reliable data pipelines with Apache Spark, optimally store and process structured and unstructured data in Delta Lake, and use Databricks to orchestrate and govern your data Key Features Learn data ingestion, data transformation, and data management techniqu