𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Pro Hadoop Data Analytics : Designing and Building Big Data Systems using the Hadoop Ecosystem

✍ Scribed by Kerry Koitzsch (auth.)


Publisher
Apress
Year
2017
Tongue
English
Leaves
304
Edition
1
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Learn advanced analytical techniques and leverage existing toolkits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems which go beyond the basics of classification, clustering, and recommendation.

In Pro Hadoop Data Analytics best practices are emphasized to ensure coherent, efficient development. A complete example system will be developed using standard third-party components which will consist of the toolkits, libraries, visualization and reporting code, as well as support glue to provide a working and extensible end-to-end system.

The book emphasizes four important topics:

  • The importance of end-to-end, flexible, configurable, high-performance data pipeline systems with analytical components as well as appropriate visualization results.
  • Best practices and structured design principles. This will include strategic topics as well as the how to example portions.
  • The importance of mix-and-match or hybrid systems, using different analytical components in one application to accomplish application goals. The hybrid approach will be prominent in the examples.
  • Use of existing third-party libraries is key to effective development. Deep dive examples of the functionality of some of these toolkits will be showcased as you develop the example system.

What You'll Learn

  • The what, why, and how of building big data analytic systems with the Hadoop ecosystem
  • Libraries, toolkits, and algorithms to make development easier and more effective
  • Best practices to use when building analytic systems with Hadoop, and metrics to measure performance and efficiency of components and systems
  • How to connect to standard relational databases, noSQL data sources, and more
  • Useful case studies and example components which assist you in creating your own systems
Who This Book Is For
Software engineers, architects, and data scientists with an interest in the design and implementation of big data analytical systems using Hadoop, the Hadoop ecosystem, and other associated technologies.

✦ Table of Contents


Front Matter....Pages i-xxi
Front Matter....Pages 1-1
Overview: Building Data Analytic Systems with Hadoop....Pages 3-27
A Scala and Python Refresher....Pages 29-42
Standard Toolkits for Hadoop and Analytics....Pages 43-62
Relational, NoSQL, and Graph Databases....Pages 63-76
Data Pipelines and How to Construct Them....Pages 77-90
Advanced Search Techniques with Hadoop, Lucene, and Solr....Pages 91-136
Front Matter....Pages 137-137
An Overview of Analytical Techniques and Algorithms....Pages 139-150
Rule Engines, System Control, and System Orchestration....Pages 151-164
Putting It All Together: Designing a Complete Analytical System....Pages 165-175
Front Matter....Pages 177-177
Data Visualizers: Seeing and Interacting with the Analysis....Pages 179-200
Front Matter....Pages 235-235
A Case Study in Bioinformatics: Analyzing Microscope Slide Data....Pages 203-214
A Bayesian Analysis Component: Identifying Credit Card Fraud....Pages 215-221
Searching for Oil: Geographical Data Analysis with Apache Mahout....Pages 223-233
β€œImage As Big Data” Systems: Some Case Studies....Pages 235-255
Building a General Purpose Data Pipeline....Pages 257-262
Conclusions and the Future of Big Data Analysis....Pages 263-273
Back Matter....Pages 275-298

✦ Subjects


Big Data;Programming Techniques;Programming Languages, Compilers, Interpreters;Data Mining and Knowledge Discovery


πŸ“œ SIMILAR VOLUMES


Pro Hadoop Data Analytics Designing and
✍ Koitzsch, Kerry πŸ“‚ Library πŸ“… 2017 πŸ› Apress 🌐 English

Chapter 1: Overview: Building Data Analytic Systems with Hadoop -- Chapter 2: A Scala and Python Refresher -- Chapter 3: Standard Toolkits for Hadoop and Analytics -- Chapter 4: Relational, noSQL, and Graph Databases -- Chapter 5: Data Pipelines and How to Construct Them -- Chapter 6: Advanced Searc

Pro Hadoop Data Analytics Designing and
✍ Apress L.P.; Koitzsch, Kerry πŸ“‚ Library πŸ“… 2016;2017 πŸ› Apress 🌐 English

Learn advanced analytical techniques and leverage existing tool kits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems that go beyond the basics of cl

Processing big data with Azure HDInsight
✍ Yadav, Vinit πŸ“‚ Library πŸ“… 2017 πŸ› Apress 🌐 English

Get a jump start on using Azure HDInsight and Hadoop Ecosystem components. As most Hadoop and Big Data projects are written in either Java, Scala, or Python, this book minimizes the effort to learn another language and is written from the perspective of a .NET developer. Hadoop components are covere

Processing Big Data with Azure HDInsight
✍ Vinit Yadav (auth.) πŸ“‚ Library πŸ“… 2017 πŸ› Apress 🌐 English

<p>Get a jump start on using Azure HDInsight and Hadoop Ecosystem components. As most Hadoop and Big Data projects are written in either Java, Scala, or Python, this book minimizes the effort to learn another language and is written from the perspective of a .NET developer. Hadoop components are cov

Big Data Using Hadoop and Hive
✍ Nitin Kumar πŸ“‚ Library πŸ“… 2021 πŸ› Mercury Learning and Information 🌐 English

This book is the basic guide for developers,architects, engineers, and anyone who wants to start leveraging the open-sourcesoftware Hadoop and Hive to build distributed, scalable concurrent big data applications. Hive will be used for reading, writing, and managing the large, data set files. The boo