Massive modern datasets make traditional data structures and algorithms grind to a halt. This fun and practical guide introduces cutting-edge techniques that can reliably handle even the largest distributed datasets. In Algorithms and Data Structures for Massive Datasets you will learn: β’ Probab
Algorithms and Data Structures for Massive Datasets
- Year
- 2021
- Tongue
- English
- Edition
- Early Access
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Table of Contents
Algorithms and Data Structures for Massive Datasets MEAP V03
Copyright
Welcome
Brief contents
1: Introduction
1.1 An example
1.1.1 An example: how to solve it
1.1.2 An example: how to solve it, take two
1.2 The structure of this book
1.3 What makes this book different and whom it is for
1.4 Why is massive data so challenging for todayβs systems?
1.4.1 The CPU-memory performance gap
1.4.2 Memory hierarchy
1.4.3 What about distributed systems?
1.5 Summary
2: Review of Hash Tables and Modern Hashing
2.1 Ubiquitous hashing
2.2 A crash course on data structures
2.3 Usage scenarios in modern systems
2.3.1 Deduplication in backup/storage solutions
2.3.2 Plagiarism detection with MOSS and Rabin-Karp fingerprinting
2.4 O(1) --- whatβs the big deal?
2.5 Collision Resolution: theory vs. practice
2.6 Usage scenario: How Pythonβs dict does it
2.7 MurmurHash
2.8 Hash Tables for Distributed Systems: Consistent Hashing
2.8.1 A typical hashing problem?
2.8.2 Hashring
2.8.3 Lookup
2.8.4 Adding a new node/resource
2.8.5 Removing a node
2.8.6 Consistent hashing scenario: Chord
2.9 Summary
3: Approximate Membership and Bloom Filter
3.1 How It Works
3.1.1 Insert
3.1.2 Lookup
3.2 Use Cases
3.2.1 Bloom Filters in Networks: Squid
3.2.2 Bitcoin mobile app
3.3 Configuring a Bloom filter for your application
3.3.1 Examples
3.4 A bit of theory
3.4.1 Can we do better?
3.5 Further reading: Bloom filter adaptations and alternatives
3.6 Quotient filter
3.6.1 Quotienting
3.6.2 Resizing
3.7 Summary
4: Frequency Estimation and Count-Min Sketch
4.1 Streaming data
4.2 Count-min sketch: how it works
4.2.1 Update
4.2.2 Estimate
4.2.3 Space and error in count-min sketch
4.3 Use cases
4.3.1 Top-k restless sleepers
4.3.2 Scaling distributional similarity of words
4.4 Range queries with count-min sketch
4.5 Approximate heavy hitters
4.5.1 Majority element
4.5.2 General heavy hitters
4.6 Summary
5: Cardinality Estimation and HyperLogLog
5.1 Counting distinct items in databases
5.2 HyperLogLog incremental design
5.2.1 The first cut --- probabilistic counting
5.2.2 Stochastic averaging or, when life gives you lemonsβ¦
5.2.3 LogLog
5.2.4 HyperLogLog --- Stochastic averaging with harmonic mean
5.3 Use case: catching worms with HLL
5.4 But how does it actually work? A mini experiment
5.4.1 The effect of the number of buckets ()
5.5 Use case: Aggregation using HyperLogLog
5.6 Summary
6: Streaming Data: Bringing Everything Together
6.1 Streaming Data System β a meta-example
6.1.1 Bloom-join
6.1.2 De-duplication
6.1.3 Load balancing and tracking the network traffic
6.2 The Future is coming: in discrete batches or as a continuous stream ?
6.3 Practical constraints and concepts in data streams
6.3.1 Time
6.3.2 Small time and small space
6.3.3 Concept shifts and concept drifts
6.3.4 Sliding window model
6.4 Summary
π SIMILAR VOLUMES
<span>In </span><span>Algorithms and Data Structures for Massive Datasets</span><span>, you'll discover methods for reducing and sketching data so it fits in small memory without losing accuracy, and unlock the algorithms and data structures that form the backbone of a big data system. </span><span>
Massive modern datasets make traditional data structures and algorithms grind to a halt. This fun and practical guide introduces cutting-edge techniques that can reliably handle even the largest distributed datasets. In Algorithms and Data Structures for Massive Datasets you will learn: Probab
The unprecedented growth of data in recent years is putting the spotlight on the data structures and algorithms that can efficiently handle large datasets. In this book, we present you with a basic suite of data structures and algorithms designed to index, query, and analyze massive data. What pr
<p><span>Data Structures are the programmatic way of storing data so that data can be used efficiently. Almost every enterprise application uses various types of data structures in one or the other way. This tutorial will give you a great understanding on Data Structures needed to understand the com
<span>Are you studying data science and want to take your learning further ? Data structures are an integral part of </span><span>data science</span><span>, </span><span>machine learning</span><span>, and </span><span>algorithms</span><span>, all aimed at solving programming challenges that might se