<p><b>Build efficient data flow and machine learning programs with this flexible, multi-functional open-source cluster-computing framework</b></p> <h4>Key Features</h4> <ul><li>Master the art of real-time big data processing and machine learning </li> <li>Explore a wide range of use-cases to analyze
Practical Real-time Data Processing and Analytics: Distributed Computing and Event Processing using Apache Spark, Flink, Storm, and Kafka
โ Scribed by Shilpi Saxena, Saurabh Gupta
- Publisher
- Packt Publishing - ebooks Account
- Year
- 2017
- Tongue
- English
- Leaves
- 422
- Category
- Library
No coin nor oath required. For personal study only.
โฆ Synopsis
A practical guide to help you tackle different real-time data processing and analytics problems using the best tools for each scenario
About This Book
- Learn about the various challenges in real-time data processing and use the right tools to overcome them
- This book covers popular tools and frameworks such as Spark, Flink, and Apache Storm to solve all your distributed processing problems
- A practical guide filled with examples, tips, and tricks to help you perform efficient Big Data processing in real-time
Who This Book Is For
If you are a Java developer who would like to be equipped with all the tools required to devise an end-to-end practical solution on real-time data streaming, then this book is for you. Basic knowledge of real-time processing would be helpful, and knowing the fundamentals of Maven, Shell, and Eclipse would be great.
What You Will Learn
- Get an introduction to the established real-time stack
- Understand the key integration of all the components
- Get a thorough understanding of the basic building blocks for real-time solution designing
- Garnish the search and visualization aspects for your real-time solution
- Get conceptually and practically acquainted with real-time analytics
- Be well equipped to apply the knowledge and create your own solutions
In Detail
With the rise of Big Data, there is an increasing need to process large amounts of data continuously, with a shorter turnaround time. Real-time data processing involves continuous input, processing and output of data, with the condition that the time required for processing is as short as possible.
This book covers the majority of the existing and evolving open source technology stack for real-time processing and analytics. You will get to know about all the real-time solution aspects, from the source to the presentation to persistence. Through this practical book, you'll be equipped with a clear understanding of how to solve challenges on your own.
We'll cover topics such as how to set up components, basic executions, integrations, advanced use cases, alerts, and monitoring. You'll be exposed to the popular tools used in real-time processing today such as Apache Spark, Apache Flink, and Storm. Finally, you will put your knowledge to practical use by implementing all of the techniques in the form of a practical, real-world use case.
By the end of this book, you will have a solid understanding of all the aspects of real-time data processing and analytics, and will know how to deploy the solutions in production environments in the best possible manner.
Style and Approach
In this practical guide to real-time analytics, each chapter begins with a basic high-level concept of the topic, followed by a practical, hands-on implementation of each concept, where you can see the working and execution of it. The book is written in a DIY style, with plenty of practical use cases, well-explained code examples, and relevant screenshots and diagrams.
โฆ Subjects
Data Modeling & Design;Databases & Big Data;Computers & Technology;Data Mining;Databases & Big Data;Computers & Technology;Data Processing;Databases & Big Data;Computers & Technology
๐ SIMILAR VOLUMES
Storm is the most popular framework for real-time stream processing. Storm provides the fundamental primitives and guarantees required for fault-tolerant distributed computing in high-volume, mission critical applications. It is both an integration technology as well as a data flow and control mecha
Spark is a framework used for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does, but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and built-in tools for interactive query analysis (Spark SQL), larg
<p><b>Process large volumes of data in real-time while building high performance and robust data stream processing pipeline using the latest Apache Kafka 2.0</b></p> Key Features <li>Solve practical large data and processing challenges with Kafka </li> <li>Tackle data processing challenges like late
Table of Contents; Foreword; Preface; Who Should Read This Book; Conventions Used in This Book; Using Code Examples; O'Reilly Safari; How to Contact Us; Acknowledgments; Chapter 1. Meet Kafka; Publish/Subscribe Messaging; How It Starts; Individual Queue Systems; Enter Kafka; Messages and Batches; Sc