Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications. Processing the enormous quantities of data necessary for these advances requires large clusters
Data-intensive text processing with MapReduce
โ Scribed by Jimmy Lin, Chris Dyer
- Publisher
- Morgan & Claypool
- Year
- 2010
- Tongue
- English
- Leaves
- 177
- Series
- Synthesis Lectures on Human Language Technologies #7
- Category
- Library
No coin nor oath required. For personal study only.
โฆ Table of Contents
Acknowledgments
Introduction
Computing in the Clouds
Big Ideas
Why Is This Different?
What This Book Is Not
MapReduce Basics
Functional Programming Roots
Mappers and Reducers
The Execution Framework
Partitioners and Combiners
The Distributed File System
Hadoop Cluster Architecture
Summary
MapReduce Algorithm Design
Local Aggregation
Combiners and In-Mapper Combining
Algorithmic Correctness with Local Aggregation
Pairs and Stripes
Computing Relative Frequencies
Secondary Sorting
Relational Joins
Reduce-Side Join
Map-Side Join
Memory-Backed Join
Summary
Inverted Indexing for Text Retrieval
Web Crawling
Inverted Indexes
Inverted Indexing: Baseline Implementation
Inverted Indexing: Revised Implementation
Index Compression
Byte-Aligned and Word-Aligned Codes
Bit-Aligned Codes
Postings Compression
What About Retrieval?
Summary and Additional Readings
Graph Algorithms
Graph Representations
Parallel Breadth-First Search
PageRank
Issues with Graph Processing
Summary and Additional Readings
EM Algorithms for Text Processing
Expectation Maximization
Maximum Likelihood Estimation
A Latent Variable Marble Game
MLE with Latent Variables
Expectation Maximization
An EM Example
Hidden Markov Models
Three Questions for Hidden Markov Models
The Forward Algorithm
The Viterbi Algorithm
Parameter Estimation for HMMs
Forward-Backward Training: Summary
EM in MapReduce
HMM Training in MapReduce
Case Study: Word Alignment for Statistical Machine Translation
Statistical Phrase-Based Translation
Brief Digression: Language Modeling with MapReduce
Word Alignment
Experiments
EM-Like Algorithms
Gradient-Based Optimization and Log-Linear Models
Summary and Additional Readings
Closing Remarks
Limitations of MapReduce
Alternative Computing Paradigms
MapReduce and Beyond
Bibliography
Authors' Biographies
๐ SIMILAR VOLUMES
<p>The authors provide an understanding of big data and MapReduce by clearly presenting the basic terminologies and concepts. They have employed over 100 illustrations and many worked-out examples to convey the concepts and methods used in big data, the inner workings of MapReduce, and single node/m