Algorithms and Data Structures for Massive Datasets

Year: 2021
Tongue: English
Edition: Early Access
Category: Library

No coin nor oath required. For personal study only.

✦ Table of Contents

Algorithms and Data Structures for Massive Datasets MEAP V03
Copyright
Welcome
Brief contents
1: Introduction
1.1 An example
1.1.1 An example: how to solve it
1.1.2 An example: how to solve it, take two
1.2 The structure of this book
1.3 What makes this book different and whom it is for
1.4 Why is massive data so challenging for today’s systems?
1.4.1 The CPU-memory performance gap
1.4.2 Memory hierarchy
1.4.3 What about distributed systems?
1.5 Summary
2: Review of Hash Tables and Modern Hashing
2.1 Ubiquitous hashing
2.2 A crash course on data structures
2.3 Usage scenarios in modern systems
2.3.1 Deduplication in backup/storage solutions
2.3.2 Plagiarism detection with MOSS and Rabin-Karp fingerprinting
2.4 O(1) --- what’s the big deal?
2.5 Collision Resolution: theory vs. practice
2.6 Usage scenario: How Python’s dict does it
2.7 MurmurHash
2.8 Hash Tables for Distributed Systems: Consistent Hashing
2.8.1 A typical hashing problem?
2.8.2 Hashring
2.8.3 Lookup
2.8.4 Adding a new node/resource
2.8.5 Removing a node
2.8.6 Consistent hashing scenario: Chord
2.9 Summary
3: Approximate Membership and Bloom Filter
3.1 How It Works
3.1.1 Insert
3.1.2 Lookup
3.2 Use Cases
3.2.1 Bloom Filters in Networks: Squid
3.2.2 Bitcoin mobile app
3.3 Configuring a Bloom filter for your application
3.3.1 Examples
3.4 A bit of theory
3.4.1 Can we do better?
3.5 Further reading: Bloom filter adaptations and alternatives
3.6 Quotient filter
3.6.1 Quotienting
3.6.2 Resizing
3.7 Summary
4: Frequency Estimation and Count-Min Sketch
4.1 Streaming data
4.2 Count-min sketch: how it works
4.2.1 Update
4.2.2 Estimate
4.2.3 Space and error in count-min sketch
4.3 Use cases
4.3.1 Top-k restless sleepers
4.3.2 Scaling distributional similarity of words
4.4 Range queries with count-min sketch
4.5 Approximate heavy hitters
4.5.1 Majority element
4.5.2 General heavy hitters
4.6 Summary
5: Cardinality Estimation and HyperLogLog
5.1 Counting distinct items in databases
5.2 HyperLogLog incremental design
5.2.1 The first cut --- probabilistic counting
5.2.2 Stochastic averaging or, when life gives you lemons…
5.2.3 LogLog
5.2.4 HyperLogLog --- Stochastic averaging with harmonic mean
5.3 Use case: catching worms with HLL
5.4 But how does it actually work? A mini experiment
5.4.1 The effect of the number of buckets ()
5.5 Use case: Aggregation using HyperLogLog
5.6 Summary
6: Streaming Data: Bringing Everything Together
6.1 Streaming Data System – a meta-example
6.1.1 Bloom-join
6.1.2 De-duplication
6.1.3 Load balancing and tracking the network traffic
6.2 The Future is coming: in discrete batches or as a continuous stream ?
6.3 Practical constraints and concepts in data streams
6.3.1 Time
6.3.2 Small time and small space
6.3.3 Concept shifts and concept drifts
6.3.4 Sliding window model
6.4 Summary

📜 SIMILAR VOLUMES

Algorithms and Data Structures for Massi

📁 Algorithms and Data Structures for Massive Datasets

✍ Dzejla Medjedovic, Emin Tahirovic 📂 Library 📅 2022 🏛 Manning Publications 🌐 English

Massive modern datasets make traditional data structures and algorithms grind to a halt. This fun and practical guide introduces cutting-edge techniques that can reliably handle even the largest distributed datasets. In Algorithms and Data Structures for Massive Datasets you will learn: • Probab

Algorithms and Data Structures for Massi

📁 Algorithms and Data Structures for Massive Datasets

✍ Dzejla Medjedovic, Emin Tahirovic 📂 Library 📅 2022 🏛 Manning 🌐 English

In Algorithms and Data Structures for Massive Datasets, you'll discover methods for reducing and sketching data so it fits in small memory without losing accuracy, and unlock the algorithms and data structures that form the backbone of a big data system.

Algorithms and Data Structures for Massi

📁 Algorithms and Data Structures for Massive Datasets

✍ Dzejla Medjedovic, Emin Tahirovic 📂 Library 📅 2022 🏛 Manning Publications Co. 🌐 English

Algorithms and Data Structures for Massi

📁 Algorithms and Data Structures for Massive Datasets

✍ Dzejla Medjedovic, Emin Tahirovic 📂 Library 📅 2021 🏛 Manning Publications 🌐 English

The unprecedented growth of data in recent years is putting the spotlight on the data structures and algorithms that can efficiently handle large datasets. In this book, we present you with a basic suite of data structures and algorithms designed to index, query, and analyze massive data. What pr

Introduction to Data Structures and Algo

📁 Introduction to Data Structures and Algorithms for Beginners : Data Structures and Algorithms Made Easy

✍ Engr. Michael David 📂 Library 📅 2021 🌐 English

Data Structures are the programmatic way of storing data so that data can be used efficiently. Almost every enterprise application uses various types of data structures in one or the other way. This tutorial will give you a great understanding on Data Structures needed to understand the com

Algorithms: Advanced Data Structures for

📁 Algorithms: Advanced Data Structures for Algorithms

✍ Andy Vickler 📂 Library 🌐 English

Are you studying data science and want to take your learning further ? Data structures are an integral part of data science, machine learning, and algorithms, all aimed at solving programming challenges that might se