This text is based on a simple and fully reactive computational model that allows for intuitive comprehension and logical designs. The principles and techniques presented can be applied to any distributed computing environment (e.g., distributed systems, communication networks, data networks, grid n
Algorithms and Parallel Computing (Wiley Series on Parallel and Distributed Computing)
β Scribed by Fayez Gebali
- Publisher
- Wiley
- Year
- 2011
- Tongue
- English
- Leaves
- 365
- Edition
- 1
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
There is a software gap between the hardware potential and the performance that can be attained using today's software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application. This book provides the techniques to explore the possible ways to program a parallel computer for a given application.
β¦ Table of Contents
Algorithms and Parallel Computing......Page 5
Contents......Page 9
Preface......Page 15
List of Acronyms......Page 21
1.1 INTRODUCTION......Page 25
1.2 TOWARD AUTOMATING PARALLEL PROGRAMMING......Page 26
1.3 ALGORITHMS......Page 28
1.4 PARALLEL COMPUTING DESIGN CONSIDERATIONS......Page 36
1.5 PARALLEL ALGORITHMS AND PARALLEL ARCHITECTURES......Page 37
1.7 IMPLEMENTATION OF ALGORITHMS: A TWO-SIDED PROBLEM......Page 38
1.8 MEASURING BENEFITS OF PARALLEL COMPUTING......Page 39
1.9 AMDAHLβS LAW FOR MULTIPROCESSOR SYSTEMS......Page 43
1.10 GUSTAFSONβBARSISβS LAW......Page 45
1.11 APPLICATIONS OF PARALLEL COMPUTING......Page 46
2.1 INTRODUCTION......Page 53
2.3 PARALLELIZING ALU STRUCTURE......Page 54
2.4 USING MEMORY HIERARCHY......Page 57
2.5 PIPELINING......Page 63
2.6 VERY LONG INSTRUCTION WORD (VLIW) PROCESSORS......Page 68
2.7 INSTRUCTION-LEVEL PARALLELISM (ILP) AND SUPERSCALAR PROCESSORS......Page 69
2.8 MULTITHREADED PROCESSOR......Page 73
3.2 PARALLEL COMPUTING......Page 77
3.3 SHARED-MEMORY MULTIPROCESSORS (UNIFORM MEMORY ACCESS [UMA])......Page 78
3.4 DISTRIBUTED-MEMORY MULTIPROCESSOR (NONUNIFORM MEMORY ACCESS [NUMA])......Page 80
3.6 SYSTOLIC PROCESSORS......Page 81
3.8 GRID (CLOUD) COMPUTING......Page 84
3.9 MULTICORE SYSTEMS......Page 85
3.10 SM......Page 86
3.11 COMMUNICATION BETWEEN PARALLEL PROCESSORS......Page 88
3.12 SUMMARY OF PARALLEL ARCHITECTURES......Page 91
4.1 INTRODUCTION......Page 93
4.2 CACHE COHERENCE AND MEMORY CONSISTENCY......Page 94
4.3 SYNCHRONIZATION AND MUTUAL EXCLUSION......Page 100
5.1 INTRODUCTION......Page 107
5.2 CLASSIFICATION OF INTERCONNECTION NETWORKS BY LOGICAL TOPOLOGIES......Page 108
5.3 INTERCONNECTION NETWORK SWITCH ARCHITECTURE......Page 115
6.2 CONCURRENCY PLATFORMS......Page 129
6.3 CILK++......Page 130
6.4 OpenMP......Page 136
6.5 COMPUTE UNIFIED DEVICE ARCHITECTURE (CUDA)......Page 146
7.1 INTRODUCTION......Page 155
7.3 INDEPENDENT LOOP SCHEDULING......Page 157
7.4 DEPENDENT LOOPS......Page 158
7.6 LOOP UNROLLING......Page 159
7.7 PROBLEM PARTITIONING......Page 160
7.8 DIVIDE-AND-CONQUER (RECURSIVE PARTITIONING) STRATEGIES......Page 161
7.9 PIPELINING......Page 163
8.2 COMPARING DAG AND DCG ALGORITHMS......Page 167
8.3 PARALLELIZING NSPA ALGORITHMS REPRESENTED BY A DAG......Page 169
8.4 FORMAL TECHNIQUE FOR ANALYZING NSPAs......Page 171
8.5 DETECTING CYCLES IN THE ALGORITHM......Page 174
8.6 EXTRACTING SERIAL AND PARALLEL ALGORITHM PERFORMANCE PARAMETERS......Page 175
8.7 USEFUL THEOREMS......Page 177
8.8 PERFORMANCE OF SERIAL AND PARALLEL ALGORITHMS ON PARALLEL COMPUTERS......Page 180
9.2 DEFINITION OF z-TRANSFORM......Page 183
9.3 THE 1-D FIR DIGITAL FILTER ALGORITHM......Page 184
9.4 SOFTWARE AND HARDWARE IMPLEMENTATIONS OF THE z-TRANSFORM......Page 185
9.5 DESIGN 1: USING HORNERβS RULE FOR BROADCAST INPUT AND PIPELINED OUTPUT......Page 186
9.6 DESIGN 2: PIPELINED INPUT AND BROADCAST OUTPUT......Page 187
9.7 DESIGN 3: PIPELINED INPUT AND OUTPUT......Page 188
10.2 THE 1-D FIR DIGITAL FILTER ALGORITHM......Page 191
10.3 THE DEPENDENCE GRAPH OF AN ALGORITHM......Page 192
10.4 DERIVING THE DEPENDENCE GRAPH FOR AN ALGORITHM......Page 193
10.5 THE SCHEDULING FUNCTION FOR THE 1-D FIR FILTER......Page 195
10.6 NODE PROJECTION OPERATION......Page 201
10.7 NONLINEAR PROJECTION OPERATION......Page 203
10.8 SOFTWARE AND HARDWARE IMPLEMENTATIONS OF THE DAG TECHNIQUE......Page 204
11.2 MATRIX MULTIPLICATION ALGORITHM......Page 209
11.3 THE 3-D DEPENDENCE GRAPH AND COMPUTATION DOMAIN D......Page 210
11.5 THE DEPENDENCE MATRICES OF THE ALGORITHM VARIABLES......Page 212
11.6 NULLSPACE OF DEPENDENCE MATRIX: THE BROADCAST SUBDOMAIN B......Page 213
11.7 DESIGN SPACE EXPLORATION: CHOICE OF BROADCASTING VERSUS PIPELINING VARIABLES......Page 216
11.8 DATA SCHEDULING......Page 219
11.9 PROJECTION OPERATION USING THE LINEAR PROJECTION OPERATOR......Page 224
11.10 EFFECT OF PROJECTION OPERATION ON DATA......Page 229
11.11 THE RESULTING MULTITHREADED/MULTIPROCESSOR ARCHITECTURE......Page 230
11.12 SUMMARY OF WORK DONE IN THIS CHAPTER......Page 231
12.3 THE IIR FILTER DEPENDENCE GRAPH......Page 233
12.4 z-DOMAIN ANALYSIS OF 1-D IIR DIGITAL FILTER ALGORITHM......Page 240
13.2 LINE AND FRAME WRAPAROUND PROBLEMS......Page 243
13.3 2-D RECURSIVE FILTERS......Page 245
13.4 3-D DIGITAL FILTERS......Page 247
14.2 DECIMATOR STRUCTURES......Page 251
14.3 DECIMATOR DEPENDENCE GRAPH......Page 252
14.4 DECIMATOR SCHEDULING......Page 254
14.5 DECIMATOR DAG FOR s1 = [1 0]......Page 255
14.6 DECIMATOR DAG FOR s2 = [1 β1]......Page 257
14.8 POLYPHASE DECIMATOR IMPLEMENTATIONS......Page 259
14.9 INTERPOLATOR STRUCTURES......Page 260
14.10 INTERPOLATOR DEPENDENCE GRAPH......Page 261
14.11 INTERPOLATOR SCHEDULING......Page 262
14.12 INTERPOLATOR DAG FOR s1 = [1 0]......Page 263
14.13 INTERPOLATOR DAG FOR s2 = [1 β1]......Page 265
14.15 POLYPHASE INTERPOLATOR IMPLEMENTATIONS......Page 267
15.2 EXPRESSING THE ALGORITHM AS A REGULAR ITERATIVE ALGORITHM (RIA)......Page 269
15.3 OBTAINING THE ALGORITHM DEPENDENCE GRAPH......Page 270
15.4 DATA SCHEDULING......Page 271
15.5 DAG NODE PROJECTION......Page 272
15.6 DESIGN 1: DESIGN SPACE EXPLORATION WHEN s = [1 1]t......Page 273
15.7 DESIGN 2: DESIGN SPACE EXPLORATION WHEN s = [1 β1]t......Page 276
15.8 DESIGN 3: DESIGN SPACE EXPLORATION WHEN s = [1 0]t......Page 277
16.1 INTRODUCTION......Page 279
16.2 FBMAS......Page 280
16.3 DATA BUFFERING REQUIREMENTS......Page 281
16.4 FORMULATION OF THE FBMA......Page 282
16.5 HIERARCHICAL FORMULATION OF MOTION ESTIMATION......Page 283
16.6 HARDWARE DESIGN OF THE HIERARCHY BLOCKS......Page 285
17.1 INTRODUCTION......Page 291
17.2 THE MULTIPLICATION ALGORITHM IN GF (2m)......Page 292
17.4 FIELD MULTIPLICATION DEPENDENCE GRAPH......Page 294
17.5 DATA SCHEDULING......Page 295
17.6 DAG NODE PROJECTION......Page 297
17.8 DESIGN 2: USING d2 = [1 1]t......Page 299
17.10 APPLICATIONS OF FINITE FIELD MULTIPLIERS......Page 301
18.2 THE POLYNOMIAL DIVISION ALGORITHM......Page 303
18.3 THE LFSR DEPENDENCE GRAPH......Page 305
18.4 DATA SCHEDULING......Page 306
18.5 DAG NODE PROJECTION......Page 307
18.6 DESIGN 1: DESIGN SPACE EXPLORATION WHEN s1 = [1 β1]......Page 308
18.7 DESIGN 2: DESIGN SPACE EXPLORATION WHEN s2 = [1 0]......Page 310
18.8 DESIGN 3: DESIGN SPACE EXPLORATION WHEN s3 = [1 β0.5]......Page 313
18.9 COMPARING THE THREE DESIGNS......Page 315
19.1 INTRODUCTION......Page 317
19.2 DECIMATION-IN-TIME FFT......Page 319
19.3 PIPELINE RADIX-2 DECIMATION-IN-TIME FFT PROCESSOR......Page 322
19.4 DECIMATION-IN-FREQUENCY FFT......Page 323
19.5 PIPELINE RADIX-2 DECIMATION-IN-FREQUENCY FFT PROCESSOR......Page 327
20.2 SPECIAL MATRIX STRUCTURES......Page 329
20.3 FORWARD SUBSTITUTION (DIRECT TECHNIQUE)......Page 333
20.5 MATRIX TRIANGULARIZATION ALGORITHM......Page 336
20.6 SUCCESSIVE OVER RELAXATION (SOR) (ITERATIVE TECHNIQUE)......Page 341
20.7 PROBLEMS......Page 345
21.1 INTRODUCTION......Page 347
21.2 FDM FOR 1-D SYSTEMS......Page 348
References......Page 355
Index......Page 361
π SIMILAR VOLUMES
This text is based on a simple and fully reactive computational model that allows for intuitive comprehension and logical designs. The principles and techniques presented can be applied to any distributed computing environment (e.g., distributed systems, communication networks, data networks, grid n
This text provides an excellent balance of theory and application that enables you to deploy powerful algorithms, frameworks, and methodologies to solve complex optimization problems in a diverse range of industries. Each chapter is written by leading experts in the fields of parallel and distribute
<b>A unique investigation of the state of the art in design, architectures, and implementations of advanced computational infrastructures and the applications they support <p> Emerging large-scale adaptive scientific and engineering applications are requiring an increasing amount of computing
<p>A unique investigation of the state of the art in design, architectures, and implementations of advanced computational infrastructures and the applications they support</p> <p>Emerging large-scale adaptive scientific and engineering applications are requiring an increasing amount of computing