This book is the basic guide for developers,architects, engineers, and anyone who wants to start leveraging the open-sourcesoftware Hadoop and Hive to build distributed, scalable concurrent big data applications. Hive will be used for reading, writing, and managing the large, data set files. The boo
Big Data Using Hadoop and Hive
โ Scribed by nitin KUmar
- Publisher
- Mercury Learning and Information
- Year
- 2021
- Tongue
- English
- Leaves
- 207
- Category
- Library
No coin nor oath required. For personal study only.
โฆ Table of Contents
Cover
Half-Title
Title
Copyright
Dedication
Contents
Preface
Chapter 1: Big Data
Big Data Challenges for Organizations
How We Are Using Big Data
Big Data: An Opportunity
Hadoop: A Big Data Solution
Big Data in the Real World
Chapter 2: What is Apache Hadoop?
Hadoop History
Hadoop Benefits
Hadoop's Ecosystem: Components
Hadoop Core Component Architecture
Summary
Chapter 3: The Hadoop Distribution Filesystem
HDFS Core Components
HDFS Architecture
Data Replication
Data Locality
Data Storage
Failure Handling on the HDFS
Erasure Coding (EC)
HDFS Disk Balancer
HDFS Federation
HDFS Architecture and Its Challenges
Hadoop Federation: A Rescue
Benefits of the HDFS Federation
HDFS Processes: Read and Write
Failure Handling During Read and Write
Chapter 4: Getting Started with Hadoop
Hadoop Configuration
Command-Line Interface
Generic Filesystem CLI Command
Distributed Copy (distcp)
Hadoop's Other User Commands
HDFS Permissions
HDFS Quotas Guide
HDFS Short-Circuit Local Reads
Offline Edits Viewer Guide
Offline Image Viewer Guide
Chapter 5: Interfaces to Access HDFS Files
WebHDFS REST API
FileSystem URIs
Error Responses
Authentication
Java FileSystem API
URI and Path
FSDataInputStream
FSDataOutputStream
FileStatus
Directories
Delete Files
C API libhdfs
Chapter 6: Yet Another Resource Negotiator
YARN Architecture
YARN Process Flow
YARN Failures
YARN High Availability
YARN Schedulers
The Fair Scheduler
The Capacity Scheduler
The YARN Timeline Server
Application Timeline Server (ATS)
ATS Data Model Structure
ATS V2
YARN Federation
Chapter 7: MapReduce
MapReduce Process
Key Features
Different Phases in the MapReduce Process
MapReduce Architecture
MapReduce Sample Program
MapReduce Composite Key Operation
Mapper Program
MapReduce Configuration
Chapter 8: Hive
Hive History
Hive Query
Data Storage
Data Model
Complex Data Types
Hive DDL (Data Definition Language)
Tables
View
Partition
Bucketing
Hive Architecture
Serialization/Deserialization (SerDe)
Metastore
Query Compiler
HiveServer2
Chapter 9: Getting Started with Hive
Hive Set-up
Hive Configuration Settings
Loading and Inserting Data into Tables
Insert from a Select Query
Load Table Data into File
Create and Load Data into a Table
Hive Transactions
Enable Transactions
Insert Values
Update
Delete
Merge
Locks
Hive Select Query
Select Basic Query
Hive QL File
Hive Select on Complex Datatypes
Order By and Sort By
Distribute By and Cluster By
Group By and Having
Built-in Aggregate Functions
Enhanced Aggregation
Table-Generating Functions
Built-In Utility Functions
Collection Functions
Date Functions
Conditional Functions
String Functions
Hive Query Language-Join
Chapter 10: File Format
File Format Characteristics
Columnar Format
Schema Evolution
Splittable
Compression
File Formats
RC (Row-Columnar) File Input Format
Optimized Row Columnar (ORC) File Format
Parquet
File Format Comparisons
ORC vs. Parquet
Chapter 11: Data Compression
Data Compression Benefits
Data Compression in Hadoop
Splitting
Compression Codec
Data Compressions
References
Index
๐ SIMILAR VOLUMES
Chapter 1: Overview: Building Data Analytic Systems with Hadoop -- Chapter 2: A Scala and Python Refresher -- Chapter 3: Standard Toolkits for Hadoop and Analytics -- Chapter 4: Relational, noSQL, and Graph Databases -- Chapter 5: Data Pipelines and How to Construct Them -- Chapter 6: Advanced Searc
Learn advanced analytical techniques and leverage existing tool kits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems that go beyond the basics of cl
<p><p>Learn advanced analytical techniques and leverage existing toolkits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems which go beyond the basics
<span>BOOKS</span>
As data grows exponentially day-by-day, extracting information becomes a tedious activity in itself. Technologies like Hadoop are trying to address some of the concerns, while Solr provides high-speed faceted search. Bringing these two technologies together is helping organizations resolve the probl