<p>Get a jump start on using Azure HDInsight and Hadoop Ecosystem components. As most Hadoop and Big Data projects are written in either Java, Scala, or Python, this book minimizes the effort to learn another language and is written from the perspective of a .NET developer. Hadoop components are cov
Processing big data with Azure HDInsight building real-world big data systems on Azure HDInsight using the Hadoop ecosystem
โ Scribed by Yadav, Vinit
- Publisher
- Apress
- Year
- 2017
- Tongue
- English
- Leaves
- 221
- Category
- Library
No coin nor oath required. For personal study only.
โฆ Synopsis
Get a jump start on using Azure HDInsight and Hadoop Ecosystem components. As most Hadoop and Big Data projects are written in either Java, Scala, or Python, this book minimizes the effort to learn another language and is written from the perspective of a .NET developer. Hadoop components are covered, including Hive, Pig, HBase, Storm, and Spark on Azure HDInsight, and code samples are written in .NET only.
Processing Big Data with Azure HDInsightcovers the fundamentals of big data, how businesses are using it to their advantage, and how Azure HDInsight fits into the big data world. This book introduces Hadoop and big data concepts and then dives into creating different solutions with HDInsight and the Hadoop Ecosystem. It covers concepts with real-world scenarios and code examples, making sure you get hands-on experience. The best way to utilize this book is to practice while reading. After reading this book you will be familiar with Azure HDInsight and how it can be utilized to build big data solutions, including batch processing, stream analytics, interactive processing, and storing and retrieving data in an efficient manner.
What You'll Learn
Understand the fundamentals of HDInsight and Hadoop
Work with HDInsight cluster
Query with Apache Hive and Apache Pig
Store and retrieve data with Apache HBase
Stream data processing using Apache Storm
Work with Apache Spark
Who This Book Is For
Software developers, technical architects, data scientists/analyts, and Hadoop administrators who want to develop on Microsoft's managed Hadoop offering, HDInsight
โฆ Table of Contents
Contents at a Glance......Page 4
Contents......Page 5
About the Author......Page 11
About the Technical Reviewer......Page 12
Acknowledgments......Page 13
Introduction......Page 14
What Is Big Data?......Page 17
The Scale-Up and Scale-Out Approaches......Page 18
A Brief History of Hadoop......Page 19
MapReduce......Page 20
YARN......Page 21
Hadoop Cluster Components......Page 22
HDInsight......Page 24
Summary......Page 27
An Azure Subscription......Page 28
Creating the First Cluster......Page 29
Basic Configuration Options......Page 31
Creating a Cluster Using the Azure Portal......Page 32
Connecting to a Cluster Using SSH......Page 37
Creating a Cluster Using PowerShell......Page 38
Creating a Cluster Using an Azure Command-Line Interface......Page 41
Creating a Cluster Using .NET SDK......Page 43
Hadoop on a Virtual Machine......Page 50
Hadoop on Windows......Page 54
Installing and Configuring Java JDK......Page 55
Download and Install HDP for Windows......Page 56
Summary......Page 58
Azure Blob Storage......Page 59
The Benefits of Blob Storage......Page 60
Using Azure Command-Line Interface......Page 62
Using Windows PowerShell......Page 64
Using Microsoft Azure Storage Explorer......Page 65
Running MapReduce Jobs......Page 67
Using PowerShell......Page 69
Using .NET SDK......Page 71
Hadoop Streaming......Page 74
Streaming Mapper and Reducer......Page 75
Binary Encoding......Page 77
Using Microsoft Avro Library......Page 80
Summary......Page 84
Hive Essentials......Page 85
Hive Architecture......Page 88
Using Hive View......Page 90
Using Secure Shell (SSH)......Page 92
Using Visual Studio......Page 93
Using .NET SDK......Page 95
Writing HiveQL......Page 96
Data Types......Page 97
Create/Drop/Alter/Use Database......Page 98
The Hive Table......Page 99
Internal Tables......Page 100
Storage Formats......Page 101
Partitioned Tables......Page 102
Create Table Options......Page 104
Data Retrieval......Page 105
Apache Tez......Page 107
ODBC and Power BI Configuration......Page 109
Prepare Data for Analysis......Page 111
Creating Hive Tables......Page 113
Analyzing Data Using Power BI......Page 114
Hive UDFs in C#......Page 119
User Defined Function (UDF)......Page 120
User Defined Aggregate Functions (UDAF)......Page 121
User Defined Tabular Functions (UDTF)......Page 123
Summary......Page 124
Chapter 5: Using Pig with HDInsight......Page 125
Understanding Relations, Bags, Tuples, and Fields......Page 126
Data Types......Page 128
Connecting to Pig......Page 129
Operators and Commands......Page 131
Summary......Page 136
Overview......Page 137
Where to Use HBase?......Page 138
The Architecture of HBase......Page 139
HBase HMaster......Page 140
HRegion and HRegion Server......Page 141
Read and Write to an HBase Cluster......Page 142
Creating an HBase Cluster......Page 144
HBase Shell......Page 146
Create Tables and Insert Data......Page 147
HBase Shell Commands......Page 149
Using .NET SDK to read/write Data......Page 150
Writing Data......Page 151
Reading/Querying Data......Page 154
Summary......Page 156
Overview......Page 157
Storm Topology......Page 160
Stream Groupings......Page 161
Supervisor Node......Page 162
Worker, Executor, and Task......Page 163
Using Azure Resource Manager......Page 165
Using Azure Web Portal......Page 166
Storm UI......Page 167
Stream Computing Platform for .NET (SCP.NET)......Page 169
ISCPSpout......Page 170
ISCPBatchBolt......Page 171
SCP Context......Page 172
Topology Builder......Page 173
Using the Acker in Storm......Page 174
Building Storm Application in C#......Page 175
Summary......Page 186
Overview......Page 187
Spark Architecture......Page 188
Creating a Spark Cluster......Page 190
Spark Shell......Page 191
Spark RDD......Page 193
RDD Transformations......Page 194
RDD Actions......Page 197
Shuffle Operations......Page 198
Persisting RDD......Page 199
Spark Applications in .NET......Page 200
Developing a Word Count Program......Page 201
Running in Local Mode......Page 203
Running in HDInsight Spark Cluster......Page 206
Jupyter Notebook......Page 207
Spark UI......Page 210
DataFrames and Datasets......Page 213
Spark SQL......Page 215
Summary......Page 216
Index......Page 217
๐ SIMILAR VOLUMES
Chapter 1: Overview: Building Data Analytic Systems with Hadoop -- Chapter 2: A Scala and Python Refresher -- Chapter 3: Standard Toolkits for Hadoop and Analytics -- Chapter 4: Relational, noSQL, and Graph Databases -- Chapter 5: Data Pipelines and How to Construct Them -- Chapter 6: Advanced Searc
Learn advanced analytical techniques and leverage existing tool kits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems that go beyond the basics of cl
<p><p>Learn advanced analytical techniques and leverage existing toolkits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems which go beyond the basics
<p><b>Explore big data concepts, platforms, analytics, and their applications using the power of Hadoop 3</b></p><h4>Key Features</h4><ul><li>Learn Hadoop 3 to build effective big data analytics solutions on-premise and on cloud</li><li>Integrate Hadoop with other big data tools such as R, Python, A
Explore big data concepts, platforms, analytics, and their applications using the power of Hadoop 3 Key Features Learn Hadoop 3 to build effective big data analytics solutions on-premise and on cloud Integrate Hadoop with other big data tools such as R, Python, Apache Spark, and Apache Flink E