Practical Real-time Data Processing and Analytics: Distributed Computing and Event Processing using Apache Spark, Flink, Storm, and Kafka

✍ Scribed by Shilpi Saxena, Saurabh Gupta

Publisher: Packt Publishing - ebooks Account
Year: 2017
Tongue: English
Leaves: 422
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

A practical guide to help you tackle different real-time data processing and analytics problems using the best tools for each scenario

About This Book

Learn about the various challenges in real-time data processing and use the right tools to overcome them
This book covers popular tools and frameworks such as Spark, Flink, and Apache Storm to solve all your distributed processing problems
A practical guide filled with examples, tips, and tricks to help you perform efficient Big Data processing in real-time

Who This Book Is For

If you are a Java developer who would like to be equipped with all the tools required to devise an end-to-end practical solution on real-time data streaming, then this book is for you. Basic knowledge of real-time processing would be helpful, and knowing the fundamentals of Maven, Shell, and Eclipse would be great.

What You Will Learn

Get an introduction to the established real-time stack
Understand the key integration of all the components
Get a thorough understanding of the basic building blocks for real-time solution designing
Garnish the search and visualization aspects for your real-time solution
Get conceptually and practically acquainted with real-time analytics
Be well equipped to apply the knowledge and create your own solutions

In Detail

With the rise of Big Data, there is an increasing need to process large amounts of data continuously, with a shorter turnaround time. Real-time data processing involves continuous input, processing and output of data, with the condition that the time required for processing is as short as possible.

This book covers the majority of the existing and evolving open source technology stack for real-time processing and analytics. You will get to know about all the real-time solution aspects, from the source to the presentation to persistence. Through this practical book, you'll be equipped with a clear understanding of how to solve challenges on your own.

We'll cover topics such as how to set up components, basic executions, integrations, advanced use cases, alerts, and monitoring. You'll be exposed to the popular tools used in real-time processing today such as Apache Spark, Apache Flink, and Storm. Finally, you will put your knowledge to practical use by implementing all of the techniques in the form of a practical, real-world use case.

By the end of this book, you will have a solid understanding of all the aspects of real-time data processing and analytics, and will know how to deploy the solutions in production environments in the best possible manner.

Style and Approach

In this practical guide to real-time analytics, each chapter begins with a basic high-level concept of the topic, followed by a practical, hands-on implementation of each concept, where you can see the working and execution of it. The book is written in a DIY style, with plenty of practical use cases, well-explained code examples, and relevant screenshots and diagrams.

✦ Subjects

Data Modeling & Design;Databases & Big Data;Computers & Technology;Data Mining;Databases & Big Data;Computers & Technology;Data Processing;Databases & Big Data;Computers & Technology

📜 SIMILAR VOLUMES

Apache Spark 2: Data Processing and Real

📁 Apache Spark 2: Data Processing and Real-Time Analytics: Master complex big data processing, stream analytics, and machine learning with Apache Spark

✍ Romeo Kienzler, Md. Rezaul Karim, Sridhar Alla, Siamak Amirghodsi, Meenakshi Raj 📂 Library 📅 2018 🏛 Packt Publishing 🌐 English

<p><b>Build efficient data flow and machine learning programs with this flexible, multi-functional open-source cluster-computing framework</b></p> <h4>Key Features</h4> <ul><li>Master the art of real-time big data processing and machine learning </li> <li>Explore a wide range of use-cases to analyze

Storm Blueprints: Patterns for Distribut

📁 Storm Blueprints: Patterns for Distributed Real-time Computation: Use Storm design patterns to perform distributed, real-time big data processing, and analytics for real-world use cases

✍ P. Taylor Goetz, Brian 📂 Library 📅 2014 🏛 Packt Publishing 🌐 English

Storm is the most popular framework for real-time stream processing. Storm provides the fundamental primitives and guarantees required for fault-tolerant distributed computing in high-volume, mission critical applications. It is both an integration technology as well as a data flow and control mecha

Real-Time Streaming with Apache Kafka, S

📁 Real-Time Streaming with Apache Kafka, Spark, and Storm: Create Platforms that Can Quickly Crunch Data and Deliver Real-Time Analytics to Users

✍ Brindha Priyadarshini Jeyaraman 📂 Library 📅 2022 🏛 BPB Publications 🌐 English

Fast Data Processing with Spark, 2nd Edi

📁 Fast Data Processing with Spark, 2nd Edition: Perform real-time analytics using Spark in a fast, distributed, and scalable way

✍ Krishna Sankar, Holden Karau 📂 Library 📅 2015 🏛 Packt Publishing 🌐 English

Spark is a framework used for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does, but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and built-in tools for interactive query analysis (Spark SQL), larg

Apache Kafka Quick Start Guide: Leverage

📁 Apache Kafka Quick Start Guide: Leverage Apache Kafka 2.0 to simplify real-time data processing for distributed applications

✍ Raul Estrada 📂 Library 📅 2018 🏛 Packt Publishing 🌐 English

<p><b>Process large volumes of data in real-time while building high performance and robust data stream processing pipeline using the latest Apache Kafka 2.0</b></p> Key Features <li>Solve practical large data and processing challenges with Kafka </li> <li>Tackle data processing challenges like late

Kafka: Real-Time Data and Stream Process

📁 Kafka: Real-Time Data and Stream Processing at Scale

✍ Narkhede, Neha;Palino, Todd;Shapira, Gwen 📂 Library 📅 2017 🏛 O'Reilly Media, Incorporated 🌐 English

Table of Contents; Foreword; Preface; Who Should Read This Book; Conventions Used in This Book; Using Code Examples; O'Reilly Safari; How to Contact Us; Acknowledgments; Chapter 1. Meet Kafka; Publish/Subscribe Messaging; How It Starts; Individual Queue Systems; Enter Kafka; Messages and Batches; Sc