𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Is Parallel Programming Hard, And, If So, What Can You Do About It?

✍ Scribed by Paul E. McKenney


Year
2017
Tongue
English
Leaves
694
Edition
2017-01
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html

✦ Table of Contents


Cover......Page 1
Title......Page 2
Contents......Page 4
1.1 Roadmap......Page 12
1.2 Quick Quizzes......Page 13
1.3 Alternatives to This Book......Page 14
1.5 Whose Book Is This?......Page 15
2.1 Historic Parallel Programming Difficulties......Page 18
2.2.1 Performance......Page 20
2.2.2 Productivity......Page 22
2.2.3 Generality......Page 23
2.3 Alternatives to Parallel Programming......Page 25
2.3.3 Performance Optimization......Page 26
2.4 What Makes Parallel Programming Hard?......Page 27
2.4.1 Work Partitioning......Page 28
2.4.3 Resource Partitioning and Replication......Page 29
2.4.6 How Do Languages and Environments Assist With These Tasks?......Page 30
2.5 Discussion......Page 31
3.1.1 Pipelined CPUs......Page 32
3.1.2 Memory References......Page 34
3.1.3 Atomic Operations......Page 35
3.1.4 Memory Barriers......Page 36
3.1.6 I/O Operations......Page 37
3.2.1 Hardware System Architecture......Page 38
3.2.2 Costs of Operations......Page 39
3.3 Hardware Free Lunch?......Page 41
3.3.2 Novel Materials and Processes......Page 42
3.3.4 Special-Purpose Accelerators......Page 43
3.4 Software Design Implications......Page 44
4.1 Scripting Languages......Page 46
4.2.1 POSIX Process Creation and Destruction......Page 47
4.2.2 POSIX Thread Creation and Destruction......Page 49
4.2.3 POSIX Locking......Page 50
4.2.4 POSIX Reader-Writer Locking......Page 53
4.2.5 Atomic Operations (gcc Classic)......Page 56
4.2.6 Atomic Operations (C11)......Page 57
4.3.2 Thread Creation, Destruction, and Control......Page 58
4.3.3 Locking......Page 60
4.3.5 Per-CPU Variables......Page 62
4.4 The Right Tool for the Job: How to Choose?......Page 64
5 Counting......Page 66
5.1 Why Isn't Concurrent Counting Trivial?......Page 67
5.2.1 Design......Page 69
5.2.2 Array-Based Implementation......Page 70
5.2.3 Eventually Consistent Implementation......Page 71
5.2.4 Per-Thread-Variable-Based Implementation......Page 74
5.3.1 Design......Page 75
5.3.2 Simple Limit Counter Implementation......Page 76
5.3.3 Simple Limit Counter Discussion......Page 82
5.4 Exact Limit Counters......Page 83
5.4.1 Atomic Limit Counter Implementation......Page 84
5.4.3 Signal-Theft Limit Counter Design......Page 88
5.4.4 Signal-Theft Limit Counter Implementation......Page 90
5.5 Applying Specialized Parallel Counters......Page 96
5.6.1 Parallel Counting Performance......Page 97
5.6.2 Parallel Counting Specializations......Page 98
5.6.3 Parallel Counting Lessons......Page 99
6.1.1 Dining Philosophers Problem......Page 102
6.1.2 Double-Ended Queue......Page 106
6.2 Design Criteria......Page 114
6.3.1 Sequential Program......Page 117
6.3.2 Code Locking......Page 118
6.3.3 Data Locking......Page 120
6.3.4 Data Ownership......Page 122
6.3.5 Locking Granularity and Performance......Page 123
6.4 Parallel Fastpath......Page 126
6.4.2 Hierarchical Locking......Page 127
6.4.3 Resource Allocator Caches......Page 128
6.5.1 Work-Queue Parallel Maze Solver......Page 134
6.5.2 Alternative Parallel Maze Solver......Page 137
6.5.3 Performance Comparison I......Page 138
6.5.4 Alternative Sequential Maze Solver......Page 141
6.5.5 Performance Comparison II......Page 142
6.5.6 Future Directions and Conclusions......Page 143
6.6 Partitioning, Parallelism, and Optimization......Page 144
7 Locking......Page 146
7.1 Staying Alive......Page 147
7.1.1 Deadlock......Page 148
7.1.2 Livelock and Starvation......Page 156
7.1.3 Unfairness......Page 157
7.2 Types of Locks......Page 158
7.2.3 Beyond Reader-Writer Locks......Page 159
7.2.4 Scoped Locking......Page 161
7.3.1 Sample Exclusive-Locking Implementation Based on Atomic Exchange......Page 163
7.3.2 Other Exclusive-Locking Implementations......Page 164
7.4 Lock-Based Existence Guarantees......Page 166
7.5.2 Locking For Parallel Libraries: Just Another Tool......Page 168
7.5.3 Locking For Parallelizing Sequential Libraries: Villain!......Page 172
7.6 Summary......Page 174
8.1 Multiple Processes......Page 176
8.3 Function Shipping......Page 177
8.6 Other Uses of Data Ownership......Page 178
9.1 Running Example......Page 180
9.2 Reference Counting......Page 181
9.3 Hazard Pointers......Page 186
9.4 Sequence Locks......Page 189
9.5 Read-Copy Update (RCU)......Page 194
9.5.1 Introduction to RCU......Page 195
9.5.2 RCU Fundamentals......Page 198
9.5.3 RCU Usage......Page 208
9.5.4 RCU Linux-Kernel API......Page 223
9.5.5 Toy'' RCU Implementations......Page 229<br>9.5.6 RCU Exercises......Page 247<br>9.6 Which to Choose?......Page 248<br>9.7 What About Updates?......Page 249<br>10.1 Motivating Application......Page 252<br>10.2.2 Hash-Table Implementation......Page 253<br>10.2.3 Hash-Table Performance......Page 257<br>10.3 Read-Mostly Data Structures......Page 258<br>10.3.1 RCU-Protected Hash Table Implementation......Page 259<br>10.3.2 RCU-Protected Hash Table Performance......Page 260<br>10.3.3 RCU-Protected Hash Table Discussion......Page 263<br>10.4.1 Resizable Hash Table Design......Page 264<br>10.4.2 Resizable Hash Table Implementation......Page 266<br>10.4.3 Resizable Hash Table Discussion......Page 272<br>10.4.4 Other Resizable Hash Tables......Page 274<br>10.5 Other Data Structures......Page 277<br>10.6.2 Bits and Bytes......Page 278<br>10.6.3 Hardware Considerations......Page 279<br>10.7 Summary......Page 281<br>11.1 Introduction......Page 282<br>11.1.1 Where Do Bugs Come From?......Page 283<br>11.1.2 Required Mindset......Page 284<br>11.1.3 When Should Validation Start?......Page 285<br>11.1.4 The Open Source Way......Page 287<br>11.2 Tracing......Page 288<br>11.4 Static Analysis......Page 289<br>11.5.2 Walkthroughs......Page 290<br>11.5.3 Self-Inspection......Page 291<br>11.6 Probability and Heisenbugs......Page 293<br>11.6.1 Statistics for Discrete Testing......Page 294<br>11.6.3 Statistics for Continuous Testing......Page 296<br>11.6.4 Hunting Heisenbugs......Page 298<br>11.7 Performance Estimation......Page 301<br>11.7.2 Profiling......Page 302<br>11.7.4 Microbenchmarking......Page 303<br>11.7.5 Isolation......Page 304<br>11.7.6 Detecting Interference......Page 305<br>11.8 Summary......Page 308<br>12.1.1 Promela and Spin......Page 312<br>12.1.2 How to Use Promela......Page 316<br>12.1.3 Promela Example: Locking......Page 319<br>12.1.4 Promela Example: QRCU......Page 321<br>12.1.5 Promela Parable: dynticks and Preemptible RCU......Page 328<br>12.1.6 Validating Preemptible RCU and dynticks......Page 333<br>12.2 Special-Purpose State-Space Search......Page 353<br>12.2.1 Anatomy of a Litmus Test......Page 354<br>12.2.2 What Does This Litmus Test Mean?......Page 355<br>12.2.3 Running a Litmus Test......Page 356<br>12.2.4 PPCMEM Discussion......Page 357<br>12.3 Axiomatic Approaches......Page 358<br>12.5 Summary......Page 360<br>13.1.2 Counting Lookups......Page 364<br>13.2 Refurbish Reference Counting......Page 365<br>13.2.1 Implementation of Reference-Counting Categories......Page 366<br>13.2.2 Linux Primitives Supporting Reference Counting......Page 371<br>13.3 RCU Rescues......Page 372<br>13.3.1 RCU and Per-Thread-Variable-Based Statistical Counters......Page 373<br>13.3.2 RCU and Counters for Removable I/O Devices......Page 375<br>13.3.3 Array and Length......Page 376<br>13.3.4 Correlated Fields......Page 377<br>13.4.1 Correlated Data Elements......Page 378<br>13.4.2 Update-Friendly Hash-Table Traversal......Page 379<br>14.1 Avoiding Locks......Page 380<br>14.2.1 Memory Ordering and Memory Barriers......Page 381<br>14.2.2 If B Follows A, and C Follows B, Why Doesn't C Follow A?......Page 382<br>14.2.3 Variables Can Have More Than One Value......Page 384<br>14.2.4 What Can You Trust?......Page 385<br>14.2.6 A Few Simple Rules......Page 393<br>14.2.7 Abstract Memory Access Model......Page 394<br>14.2.8 Device Operations......Page 395<br>14.2.9 Guarantees......Page 396<br>14.2.10 What Are Memory Barriers?......Page 397<br>14.2.11 Locking Constraints......Page 410<br>14.2.12 Memory-Barrier Examples......Page 411<br>14.2.13 The Effects of the CPU Cache......Page 413<br>14.3 Non-Blocking Synchronization......Page 415<br>14.3.1 Simple NBS......Page 417<br>14.3.2 NBS Discussion......Page 418<br>15.1.1 Soft Real Time......Page 420<br>15.1.2 Hard Real Time......Page 421<br>15.1.3 Real-World Real Time......Page 422<br>15.2 Who Needs Real-Time Computing?......Page 426<br>15.3 Who Needs Parallel Real-Time Computing?......Page 427<br>15.4 Implementing Parallel Real-Time Systems......Page 428<br>15.4.1 Implementing Parallel Real-Time Operating Systems......Page 429<br>15.4.2 Implementing Parallel Real-Time Applications......Page 442<br>15.4.3 The Role of RCU......Page 445<br>15.5 Real Time vs. Real Fast: How to Choose?......Page 446<br>16.2 Rusty Scale for API Design......Page 448<br>16.3 Shaving the Mandelbrot Set......Page 450<br>17.1.1 Uniprocessor Über Alles......Page 452<br>17.1.2 Multithreaded Mania......Page 453<br>17.1.3 More of the Same......Page 454<br>17.1.4 Crash Dummies Slamming into the Memory Wall......Page 455<br>17.2.1 Outside World......Page 458<br>17.2.2 Process Modification......Page 462<br>17.2.3 Synchronization......Page 467<br>17.2.4 Discussion......Page 471<br>17.3 Hardware Transactional Memory......Page 473<br>17.3.1 HTM Benefits WRT to Locking......Page 474<br>17.3.2 HTM Weaknesses WRT Locking......Page 476<br>17.3.3 HTM Weaknesses WRT to Locking When Augmented......Page 482<br>17.3.4 Where Does HTM Best Fit In?......Page 485<br>17.3.5 Potential Game Changers......Page 486<br>17.3.6 Conclusions......Page 488<br>17.4 Functional Programming for Parallelism......Page 489<br>A.1 What DoesAfter'' Mean?......Page 492
A.2 What is the Difference Between Concurrent'' andParallel''?......Page 495
A.3 What Time Is It?......Page 496
B.1 Cache Structure......Page 498
B.2 Cache-Coherence Protocols......Page 500
B.2.2 MESI Protocol Messages......Page 501
B.2.3 MESI State Diagram......Page 502
B.2.4 MESI Protocol Example......Page 504
B.3.1 Store Buffers......Page 505
B.3.2 Store Forwarding......Page 506
B.3.3 Store Buffers and Memory Barriers......Page 507
B.4.1 Invalidate Queues......Page 510
B.4.3 Invalidate Queues and Memory Barriers......Page 511
B.5 Read and Write Memory Barriers......Page 514
B.6.1 Ordering-Hostile Architecture......Page 515
B.6.3 Example 2......Page 516
B.6.4 Example 3......Page 517
B.7 Memory-Barrier Instructions For Specific CPUs......Page 518
B.7.1 Alpha......Page 520
B.7.3 ARMv7-A/R......Page 522
B.7.4 IA64......Page 523
B.7.5 MIPS......Page 524
B.7.7 POWER / PowerPC......Page 525
B.7.8 SPARC RMO, PSO, and TSO......Page 526
B.7.9 x86......Page 527
B.8 Are Memory Barriers Forever?......Page 528
B.9 Advice to Hardware Designers......Page 529
C.1 How To Use This Book......Page 532
C.2 Introduction......Page 533
C.3 Hardware and its Habits......Page 539
C.4 Tools of the Trade......Page 543
C.5 Counting......Page 551
C.6 Partitioning and Synchronization Design......Page 571
C.7 Locking......Page 577
C.8 Data Ownership......Page 587
C.9 Deferred Processing......Page 590
C.10 Data Structures......Page 614
C.11 Validation......Page 618
C.12 Formal Verification......Page 625
C.13 Putting It All Together......Page 632
C.14 Advanced Synchronization......Page 636
C.15 Parallel Real-Time Computing......Page 641
C.16 Ease of Use......Page 643
C.17 Conflicting Visions of the Future......Page 644
C.18 Important Questions......Page 648
C.19 Why Memory Barriers?......Page 649
D Glossary and Bibliography......Page 654
E.2 Reviewers......Page 690
E.5 Figure Credits......Page 691
E.6 Other Support......Page 693
Back......Page 694


πŸ“œ SIMILAR VOLUMES


Is Parallel Programming Hard, And, If So
✍ Paul E. McKenney πŸ“‚ Library πŸ“… 2023 πŸ› Self Published 🌐 English

<span>This book examines what makes parallel programming hard, and describes design techniques that can help you avoid many parallel-programming pitfalls. It is primarily intended for low-level C/C++ code, but offers valuable lessons for other environments as well.</span>