𝔖 Scriptorium
✦   LIBER   ✦

📁

Enterprise Data Workflows with Cascading

✍ Scribed by Paco Nathan


Publisher
O'Reilly Media
Year
2013
Tongue
English
Leaves
169
Edition
1
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


There is an easier way to build Hadoop applications. With this hands-on book, you’ll learn how to use Cascading, the open source abstraction framework for Hadoop that lets you easily create and manage powerful enterprise-grade data processing applications—without having to learn the intricacies of MapReduce.

Working with sample apps based on Java and other JVM languages, you’ll quickly learn Cascading’s streamlined approach to data processing, data filtering, and workflow optimization. This book demonstrates how this framework can help your business extract meaningful information from large amounts of distributed data.

  • Start working on Cascading example projects right away
  • Model and analyze unstructured data in any format, from any source
  • Build and test applications with familiar constructs and reusable components
  • Work with the Scalding and Cascalog Domain-Specific Languages
  • Easily deploy applications to Hadoop, regardless of cluster location or data size
  • Build workflows that integrate several big data frameworks and processes
  • Explore common use cases for Cascading, including features and tools that support them
  • Examine a case study that uses a dataset from the Open Data Initiative

✦ Table of Contents


Copyright......Page 4
Table of Contents......Page 5
Enterprise Data Workflows......Page 9
Complexity, More So Than Bigness......Page 13
Origins of the Cascading API......Page 16
Using Code Examples......Page 18
How to Contact Us......Page 19
Kudos......Page 20
Programming Environment Setup......Page 21
Example 1: Simplest Possible App in Cascading......Page 23
Build and Run......Page 24
Cascading Taxonomy......Page 26
Example 2: The Ubiquitous Word Count......Page 28
Flow Diagrams......Page 30
Predictability at Scale......Page 34
Example 3: Customized Operations......Page 37
Scrubbing Tokens......Page 41
Example 4: Replicated Joins......Page 42
Stop Words and Replicated Joins......Page 45
Comparing with Apache Pig......Page 47
Comparing with Apache Hive......Page 49
Example 5: TF-IDF Implementation......Page 53
Example 6: TF-IDF with Testing......Page 61
A Word or Two About Testing......Page 68
Why Use Scalding?......Page 71
Getting Started with Scalding......Page 72
Example 3 in Scalding: Word Count with Customized Operations......Page 74
A Word or Two about Functional Programming......Page 77
Example 4 in Scalding: Replicated Joins......Page 79
Build Scalding Apps with Gradle......Page 81
Running on Amazon AWS......Page 82
Why Use Cascalog?......Page 85
Getting Started with Cascalog......Page 86
Example 1 in Cascalog: Simplest Possible App......Page 89
Example 4 in Cascalog: Replicated Joins......Page 91
Example 6 in Cascalog: TF-IDF with Testing......Page 94
Cascalog Technology and Uses......Page 98
Applications and Organizations......Page 101
Lingual, a DSL for ANSI SQL......Page 104
Using the SQL Command Shell......Page 105
Using the JDBC Driver......Page 107
Integrating with Desktop Tools......Page 109
Pattern, a DSL for Predictive Model Markup Language......Page 112
Getting Started with Pattern......Page 113
Predefined App for PMML......Page 114
Integrating Pattern into Cascading Apps......Page 121
Customer Experiments......Page 122
Technology Roadmap for Pattern......Page 125
Key Insights......Page 127
Pattern Language......Page 129
Literate Programming......Page 130
Separation of Concerns......Page 131
Functional Relational Programming......Page 132
Enterprise vs. Start-Ups......Page 134
City of Palo Alto......Page 137
Moving from Raw Sources to Data Products......Page 138
Calibrating Metrics for the Recommender......Page 147
Spatial Indexing......Page 149
Personalization......Page 154
Recommendations......Page 155
Build and Run......Page 156
Key Points of the Recommender Workflow......Page 157
Build and Runtime Problems......Page 161
Workflow Bottlenecks......Page 162
Other Resources......Page 163
Index......Page 165
About the Author......Page 169


📜 SIMILAR VOLUMES


Enterprise Data Workflows with Cascading
✍ Paco Nathan 📂 Library 📅 2013 🏛 O'Reilly Media 🌐 English

<DIV><p>There <i>is</i> an easier way to build Hadoop applications. With this hands-on book, you’ll learn how to use Cascading, the open source abstraction framework for Hadoop that lets you easily create and manage powerful enterprise-grade data processing applications—without having to learn the i

Enterprise Data Workflows with Cascading
✍ Paco Nathan 📂 Library 📅 2013 🏛 O'Reilly Media 🌐 English

<DIV><p>There <i>is</i> an easier way to build Hadoop applications. With this hands-on book, you’ll learn how to use Cascading, the open source abstraction framework for Hadoop that lets you easily create and manage powerful enterprise-grade data processing applications—without having to learn the i

Enterprise Data Workflows with Cascading
✍ Paco Nathan 📂 Library 📅 2013 🏛 O'Reilly Media 🌐 English

<DIV><p>There <i>is</i> an easier way to build Hadoop applications. With this hands-on book, you’ll learn how to use Cascading, the open source abstraction framework for Hadoop that lets you easily create and manage powerful enterprise-grade data processing applications—without having to learn the i

Enterprise Data Workflows with Cascading
✍ Nathan, Paco 📂 Library 📅 2013 🏛 O'Reilly Media 🌐 English

<p>Despite its growing use in the enterprise, building applications for Hadoop is notoriously difficult. But there is a solution. This hands-on book introduces you to Cascading, the framework that enables you to build powerful data processing applications on Hadoop without having to spend months lea

Enterprise Data Workflows with Cascading
✍ Paco Nathan 📂 Library 📅 2013 🏛 O'Reilly Media 🌐 English

There is an easier way to build Hadoop applications. With this hands-on book, you'll learn how to use Cascading, the open source abstraction framework for Hadoop that lets you easily create and manage powerful enterprise-grade data processing applications - without having to learn the intricacies of

Groupware, Workflow and Intranets : Reen
✍ Dave Chaffey 📂 Library 📅 1998 🏛 Digital Press 🌐 English

This comprehensive guide for system developers, IT managers and consultants focuses on how intranets, groupware and work flow technologies can be used to improve the efficiency of their organizations. The focus is on how to use these tools to support organization transformation through business proc