With the explosion of data, the open source Apache Hadoop ecosystem is gaining traction, thanks to its huge ecosystem that has arisen around the core functionalities of its distributed file system (HDFS) and Map Reduce. As of today, being able to have SQL Server talking to Hadoop has become increasi
PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond
β Scribed by Kevin Feasel
- Publisher
- Apress
- Year
- 2020
- Tongue
- English
- Leaves
- 320
- Edition
- 1st ed.
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Harness the power of PolyBase data virtualization software to make data from a variety of sources easily accessible through SQL queries while using the T-SQL skills you already know and have mastered.
PolyBase Revealed shows you how to use the PolyBase feature of SQL Server 2019 to integrate SQL Server with Azure Blob Storage, Apache Hadoop, other SQL Server instances, Oracle, Cosmos DB, Apache Spark, and more. You will learn how PolyBase can help you reduce storage and other costs by avoiding the need for ETL processes that duplicate data in order to make it accessible from one source. PolyBase makes SQL Server into that one source, and T-SQL is your golden ticket. The book also covers PolyBase scale-out clusters, allowing you to distribute PolyBase queries among several SQL Server instances, thus improving performance.
With great flexibility comes great complexity, and this book shows you where to look when queries fail, complete with coverage of internals, troubleshooting techniques, and where to find more information on obscure cross-platform errors. Data virtualization is a key target for Microsoft with SQL Server 2019. This book will help you keep your skills current, remain relevant, and build new business and career opportunities around Microsoftβs product direction.
- Install and configure PolyBase as a stand-alone service, or unlock its capabilities with a scale-out cluster
- Understand how PolyBase interacts with outside data sources while presenting their data as regular SQL Server tables
- Write queries combining data from SQL Server, Apache Hadoop, Oracle, Cosmos DB, Apache Spark, and more
- Troubleshoot PolyBase queries using SQL Server Dynamic Management Views
- Tune PolyBase queries using statistics and execution plans
- Solve common business problems, including "cold storage" of infrequently accessed data and simplifying ETL jobs
Who This Book Is For
SQL Server developers working in multi-platform environments who want one easy way of communicating with, and collecting data from, all of these sources
β¦ Table of Contents
Front Matter ....Pages i-xix
Installing and Configuring PolyBase (Kevin Feasel)....Pages 1-31
Connecting to Azure Blob Storage (Kevin Feasel)....Pages 33-62
Connecting to Hadoop (Kevin Feasel)....Pages 63-93
Using Predicate Pushdown to Enhance Query Performance (Kevin Feasel)....Pages 95-125
Common Hadoop and Blob Storage Integration Errors (Kevin Feasel)....Pages 127-149
Integrating with SQL Server (Kevin Feasel)....Pages 151-182
Built-In Integrations: Cosmos DB, Oracle, and More (Kevin Feasel)....Pages 183-204
Integrating via ODBC (Kevin Feasel)....Pages 205-231
PolyBase in Azure Synapse Analytics (Kevin Feasel)....Pages 233-249
Examining PolyBase via Dynamic Management Views (Kevin Feasel)....Pages 251-271
Query Tuning with Statistics and Execution Plans (Kevin Feasel)....Pages 273-288
PolyBase in Practice (Kevin Feasel)....Pages 289-304
Back Matter ....Pages 305-311
β¦ Subjects
Computer Science; Database Management; Microsoft and .NET
π SIMILAR VOLUMES
<p>Integrate data between Apache Hadoop and SQL Server 2012 and provide business intelligence on the heterogeneous data </p> <p><b>Overview</b></p> <ul> <li>Integrate data from unstructured (Hadoop) and structured (SQL Server 2012) sources</li> <li>Configure and install connectors for a bi-direction
With the explosion of data, the open source Apache Hadoop ecosystem is gaining traction, thanks to its huge ecosystem that has arisen around the core functionalities of its distributed file system (HDFS) and Map Reduce. As of today, being able to have SQL Server talking to Hadoop has become increasi
<p>Getting SQL Server talking to Hadoop is a smooth process when you follow this tutorial. Learn all the tools and techniques you need integrate the data and then extract powerful business insights from the merged result.</p> <p><b>Overview</b></p> <ul> <li>Integrate data from unstructured (Hadoop)
<P style="MARGIN: 0px"> <I>βThis book is a critically needed resource for the newly released Apache Hadoop 2.0, highlighting YARN as the significant breakthrough that broadens Hadoop beyond the MapReduce paradigm.β</I> <BR>βFrom the Foreword by <B>Raymie Stata, CEO of Altiscale</B> </P> <P style="M
βThis book is a critically needed resource for the newly released Apache Hadoop 2.0, highlighting YARN as the significant breakthrough that broadens Hadoop beyond the MapReduce paradigm.β βFrom the Foreword by Raymie Stata, CEO of Altiscale The Insiderβs Guide to Building Distributed, Big Data Appli