๐”– Scriptorium
โœฆ   LIBER   โœฆ

๐Ÿ“

Pro Hadoop Data Analytics Designing and Building Big Data Systems using the Hadoop Ecosystem

โœ Scribed by Koitzsch, Kerry


Publisher
Apress
Year
2017
Tongue
English
Leaves
298
Category
Library

โฌ‡  Acquire This Volume

No coin nor oath required. For personal study only.

โœฆ Synopsis


Chapter 1: Overview: Building Data Analytic Systems with Hadoop -- Chapter 2: A Scala and Python Refresher -- Chapter 3: Standard Toolkits for Hadoop and Analytics -- Chapter 4: Relational, noSQL, and Graph Databases -- Chapter 5: Data Pipelines and How to Construct Them -- Chapter 6: Advanced Search Techniques with Hadoop, Lucene, and Solr -- Chapter 7: An Overview of Analytical Techniques and Algorithms -- Chapter 8: Rule Engines, System Control, and System Orchestration -- Chapter 9: Putting it All Together: Designing a Complete Analytical System -- Chapter 10: Data Visualizers: Seeing and Interacting with the Analysis -- Chapter 11: A Case Study in Bioinformatics: Analyzing Microscope Slide Data -- Chapter 12: A Bayesian Analysis Software Component: Identifying Credit Card Fraud -- Chapter 13: Searching for Oil: Geological Data Analysis with Mahout -- Chapter 14: 'Image as Big Data' Systems: Some Case Studies -- Chapter 15: A Generic Data Pipeline Analytical System -- Chapter 16: Conclusions and The Future of Big Data Analysis.;Learn advanced analytical techniques and leverage existing toolkits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems which go beyond the basics of classification, clustering, and recommendation. In Pro Hadoop Data Analytics best practices are emphasized to ensure coherent, efficient development. A complete example system will be developed using standard third-party components which will consist of the toolkits, libraries, visualization and reporting code, as well as support glue to provide a working and extensible end-to-end system. The book emphasizes four important topics: The importance of end-to-end, flexible, configurable, high-performance data pipeline systems with analytical components as well as appropriate visualization results. Deep-dive topics will include Spark, H20, Vopal Wabbit (NLP), Stanford NLP, and other appropriate toolkits and plugins. Best practices and structured design principles. This will include strategic topics as well as the how to example portions. The importance of mix-and-match or hybrid systems, using different analytical components in one application to accomplish application goals. The hybrid approach will be prominent in the examples. Use of existing third-party libraries is key to effective development. Deep dive examples of the functionality of some of these toolkits will be showcased as you develop the example system. .

โœฆ Table of Contents


Chapter 1: Overview: Building Data Analytic Systems with Hadoop --
Chapter 2: A Scala and Python Refresher --
Chapter 3: Standard Toolkits for Hadoop and Analytics --
Chapter 4: Relational, noSQL, and Graph Databases --
Chapter 5: Data Pipelines and How to Construct Them --
Chapter 6: Advanced Search Techniques with Hadoop, Lucene, and Solr --
Chapter 7: An Overview of Analytical Techniques and Algorithms --
Chapter 8: Rule Engines, System Control, and System Orchestration --
Chapter 9: Putting it All Together: Designing a Complete Analytical System --
Chapter 10: Data Visualizers: Seeing and Interacting with the Analysis --
Chapter 11: A Case Study in Bioinformatics: Analyzing Microscope Slide Data --
Chapter 12: A Bayesian Analysis Software Component: Identifying Credit Card Fraud --
Chapter 13: Searching for Oil: Geological Data Analysis with Mahout --
Chapter 14: 'Image as Big Data' Systems: Some Case Studies --
Chapter 15: A Generic Data Pipeline Analytical System --
Chapter 16: Conclusions and The Future of Big Data Analysis.

โœฆ Subjects


Big data;Big Data;Computer science;Data mining;Data Mining and Knowledge Discovery;Programming Languages, Compilers, Interpreters;Programming Techniques


๐Ÿ“œ SIMILAR VOLUMES


Pro Hadoop Data Analytics Designing and
โœ Apress L.P.; Koitzsch, Kerry ๐Ÿ“‚ Library ๐Ÿ“… 2016;2017 ๐Ÿ› Apress ๐ŸŒ English

Learn advanced analytical techniques and leverage existing tool kits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems that go beyond the basics of cl

Pro Hadoop Data Analytics : Designing an
โœ Kerry Koitzsch (auth.) ๐Ÿ“‚ Library ๐Ÿ“… 2017 ๐Ÿ› Apress ๐ŸŒ English

<p><p>Learn advanced analytical techniques and leverage existing toolkits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems which go beyond the basics

Processing big data with Azure HDInsight
โœ Yadav, Vinit ๐Ÿ“‚ Library ๐Ÿ“… 2017 ๐Ÿ› Apress ๐ŸŒ English

Get a jump start on using Azure HDInsight and Hadoop Ecosystem components. As most Hadoop and Big Data projects are written in either Java, Scala, or Python, this book minimizes the effort to learn another language and is written from the perspective of a .NET developer. Hadoop components are covere

Processing Big Data with Azure HDInsight
โœ Vinit Yadav (auth.) ๐Ÿ“‚ Library ๐Ÿ“… 2017 ๐Ÿ› Apress ๐ŸŒ English

<p>Get a jump start on using Azure HDInsight and Hadoop Ecosystem components. As most Hadoop and Big Data projects are written in either Java, Scala, or Python, this book minimizes the effort to learn another language and is written from the perspective of a .NET developer. Hadoop components are cov

Big Data Using Hadoop and Hive
โœ Nitin Kumar ๐Ÿ“‚ Library ๐Ÿ“… 2021 ๐Ÿ› Mercury Learning and Information ๐ŸŒ English

This book is the basic guide for developers,architects, engineers, and anyone who wants to start leveraging the open-sourcesoftware Hadoop and Hive to build distributed, scalable concurrent big data applications. Hive will be used for reading, writing, and managing the large, data set files. The boo