A friendly, framework-agnostic tutorial that will help you grok how streaming systems workβand how to build your own! In Grokking Streaming Systems you will learn how to: β’ Implement and troubleshoot streaming systems β’ Design streaming systems for complex functionalities β’ Assess parallelizat
Grokking Streaming Systems. Real-time event processing
β Scribed by Josh Fischer, Ning Wang
- Publisher
- Manning Publications
- Year
- 2022
- Tongue
- English
- Leaves
- 313
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Table of Contents
Streaming Systems
brief contents
contents
preface
acknowledgments
about this book
about the authors
Part 1βGetting started with streaming
1 Welcome to Grokking Streaming Systems
What is stream processing?
Streaming system examples
Streaming systems and real time
How a streaming system works
Applications
Backend services
Inside a backend service
Batch processing systems
Inside a batch processing system
Stream processing systems
Inside a stream processing system
The advantages of multi-stage architecture
The multi-stage architecture in batch and stream processing systems
Compare the systems
A model stream processing system
2 Hello, streaming systems!
The chief needs a fancy toll booth
It started as HTTP requests, and it failed
AJ and Miranda take time to reflect
AJ ponders about streaming systems
Comparing backend service and streaming
How a streaming system could fit
Queues: A foundational concept
Data transfer via queues
Our streaming framework (the start of it)
The Streamwork framework overview
Zooming in on the Streamwork engine
Core streaming concepts
More details of the concepts
The streaming job execution flow
Your first streaming job
Executing the job
Inspecting the job execution
Look inside the engine
Keep events moving
The life of a data element
Reviewing streaming concepts
3 Parallelization and data grouping
The sensor is emitting more events
Even in streaming, real time is hard
New concepts Parallelism is important
New concepts: Data parallelism
New concepts: Data execution independence
New concepts: Task parallelism
Data parallelism vs. task parallelism
Parallelism and concurrency
Parallelizing the job
Parallelizing components
Parallelizing sources
Viewing job output
Parallelizing operators
Viewing job output
Events and instances
Event ordering
Event grouping
Shuffle grouping
Shuffle grouping: Under the hood
Fields grouping
Fields grouping: Under the hood
Event grouping execution
Look inside the engine: Event dispatcher
Applying fields grouping in your job
Event ordering
Comparing grouping behaviors
4 Stream graph
A credit card fraud detection system
More about the credit card fraud detection system
The fraud detection business
Streaming isnβt always a straight line
Zoom into the system
The fraud detection job in detail
New concepts
Upstream and downstream components
Stream fan-out and fan-in
Graph, directed graph, and DAG
DAG in stream processing systems
All new concepts in one page
Stream fan-out to the analyzers
Look inside the engine
There is a problem: Efficiency
Stream fan-out with different streams
Look inside the engine again
Communication between the components via channels
Multiple channels
Stream fan-in to the score aggregator
Stream fan-in in the engine
A brief introduction to another stream fan-in: Join
Look at the whole system
Graph and streaming jobs
The example systems
5 Delivery semantics
The latency requirement of the fraud detection system
Revisit the fraud detection job
About accuracy
Partial result
A new streaming job to monitor system usage
The new system usage job
The requirements of the new system usage job
New concepts: (The number of) times delivered and times processed
New concept: Delivery semantics
Choosing the right semantics
At-most-once
The fraud detection job
At-least-once
At-least-once with acknowledging
Track events
Handle event processing failures
Track early out events
Acknowledging code in components
New concept: Checkpointing
New concept: State
Checkpointing in the system usage job for the at-least-once semantic
Checkpointing and state manipulation functions
State handling code in the transaction source component
Exactly-once or effectively-once?
Bonus concept: Idempotent operation
Exactly-once, finally
State handling code in the system usage analyzer component
Comparing the delivery semantics again
Up next . . .
6 Streaming systems review and a glimpse ahead
Streaming system pieces
Parallelization and event grouping
DAGs and streaming jobs
Delivery semantics (guarantees)
Delivery semantics used in the credit card fraud detection system
Which way to go from here
Windowed computations
Joining data in real time
Backpressure
Stateless and stateful computations
Part 2βStepping up
7 Windowed computations
Slicing up real-time data
Breaking down the problem in detail
Breaking down the problem in detail (continued)
Two different contexts
Windowing in the fraud detection job
What exactly are windows?
Looking closer into the window
New concept: Windowing strategy
Fixed windows
Fixed windows in the windowed proximity analyzer
Detecting fraud with a fixed time window
Fixed windows: Time vs. count
Sliding windows
Sliding windows: Windowed proximity analyzer
Detecting fraud with a sliding window
Session windows
Session windows (continued)
Detecting fraud with session windows
Summary of windowing strategies
Slicing an event stream into data sets
Windowing: Concept or implementation
Another look
Keyβvalue store 101
Implement the windowed proximity analyzer
Event time and other times for events
Windowing watermark
Late events
8 Join operations
Joining emission data on the fly
The emissions job version 1
The emission resolver
Accuracy becomes an issue
The enhanced emissions job
Focusing on the join
What is a join again?
How the stream join works
Stream join is a different kind of fan-in
Vehicle events vs. temperature events
Table: A materialized view of streaming
Vehicle events are less efficient to be materialized
Data integrity quickly became an issue
Whatβs the problem with this join operator?
Inner join
Outer join
The inner join vs. outer join
Different types of joins
Outer joins in streaming systems
A new issue: Weak connection
Windowed joins
Joining two tables instead of joining a stream and table
Revisiting the materialized view
9 Backpressure
Reliability is critical
Review the system
Streamlining streaming jobs
New concepts: Capacity, utilization, and headroom
More about utilization and headroom
New concept: Backpressure
Measure capacity utilization
Backpressure in the Streamwork engine
Backpressure in the Streamwork engine: Propagation
Our streaming job during a backpressure
Backpressure in distributed systems
New concept: Backpressure watermarks
Another approach to handle lagging instances: Dropping events
Why do we want to drop events?
Backpressure could be a symptom when the underlying issue is permanent
Stopping and resuming may lead to thrashing if the issue is permanent
Handle thrashing
10 Stateful computation
The migration of the streaming jobs
Stateful components in the system usage job
Revisit: State
The states in different components
State data vs. temporary data
Stateful vs. stateless components: The code
The stateful source and operator in the system usage job
States and checkpoints
Checkpoint creation: Timing is hard
Event-based timing
Creating checkpoints with checkpoint events
A checkpoint event is handled by instance executors
A checkpoint event flowing through a job
Creating checkpoints with checkpoint events at the instance level
Checkpoint event synchronization
Checkpoint loading and backward compatibility
Checkpoint storage
Stateful vs. stateless components
Manually managed instance states
Lambda architecture
11 Wrap-up: Advanced concepts in streaming systems
Is this really the end?
Windowed computations
The major window types
Joining data in real time
SQL vs. stream joins
Inner joins vs. outer joins
Unexpected things can happen in streaming systems
Backpressure: Slow down sources or upstream components
Another approach to handle lagging instances: Dropping events
Backpressure can be a symptom when the underlying issue is permanent
Stateful components with checkpoints
Event-based timing
Stateful vs. stateless components
You did it!
Key concepts covered in this book
index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
π SIMILAR VOLUMES
A friendly, framework-agnostic tutorial that will help you grok how streaming systems workβand how to build your own! In Grokking Streaming Systems you will learn how to: Implement and troubleshoot streaming systems Design streaming systems for complex functionalities Assess parallelization re
<span>A friendly, framework-agnostic tutorial that will help you grok how streaming systems workβand how to build your own!</span><span><br><br>In </span><span>Grokking Streaming Systems</span><span> you will learn how to:<br> <br>Β Β Β Implement and troubleshoot streaming systems<br> Β Β Β Design strea
Event Streams in Action is a foundational book introducing the ULP paradigm and presenting techniques to use it effectively in data-rich environments. About the Technology Many high-profile applications, like LinkedIn and Netflix, deliver nimble, responsive performance by reacting to user and sy
Stream events to Kafka is commonly used in today's information technology world as data is flowing in and out through systems in various industries like banking, healthcare, CRM, sales etc. Key factor of information technology is data analytics, data cleansing, real time data monitoring etc. This bo
Stream events to Kafka is commonly used in today's information technology world as data is flowing in and out through systems in various industries like banking, healthcare, CRM, sales etc. Key factor of information technology is data analytics, data cleansing, real time data monitoring etc. This bo