𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Optimizations and Cost Models for multi-core architectures: an approach based on parallel paradigms

✍ Scribed by Daniele Buono


Publisher
UniversitΓ  di Pisa
Year
2014
Tongue
English
Leaves
313
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Table of Contents


I Introduction
Introduction
Structured parallel programming
Parallel patterns and their optimizations
Multiple memory interfaces
Automatic Cache Coherence
Introducing a performance model
Towards a parallel programming environment
List of Contributions of the Thesis
Outline of the Thesis
Current publications by the author
Background
Chip MultiProcessor architectures
Processor architecture
Interconnection network
Memory bandwidth and organization
Atomic operations and synchronizations
Cache coherence
Number of cores
Parallel programming on Chip MultiProcessors
Programming Languages
Libraries
Our vision of parallel programming
Performance model for multiprocessors
Algorithm oriented performance models for multiprocessors
Hardware-oriented performance cost models
Summary
Structured parallel programming for multi-core
The need for high level parallel programming
Structured parallel programming
Parallel Paradigms
Stream Parallelism
Task-Farm
Pipeline
Data Parallelism
Map
Reduce
Map + Reduce, a notable composition
Data-Parallel with Stencil
Stencil Transformations
Expressing Parallel Paradigms
Skeletons
ASSIST: Beyond the classical skeleton approach
The Virtual Processors approach
Parallel patterns and their (many) implementations
Mastering the possibilities, one piece at a time
Towards a novel parallel programming environment
Target architectures
II Cost Models
A hardware-dependent model based on QNs
A general approach to parallel performance prediction
The case of single-element streams
Performance prediction of a parallel module
An example: cost model for a trivial task-farm implementation
Sequential code analysis
Latency Model
Service Time Model
Evaluating the model parameters
Evaluating the sequential time
Modeling communications latencies
The final model for the task-farm example
Performance degradation on shared memory architectures
Extensions to the original queueing network
Modeling caches
Bus interconnections
Multiple Requests per processor
Complex interconnection networks
Cache coherency
Adapting the model to a concrete parallel architecture
Summary
A Queueing Network Model for Tilera TILEPro64β„’
EQNSim: a testing environment for queueing network models
Architecture overview of Tilera TILEPro64β„’
Processors
Cache Hierarchy and Coherency
Hash-for-Home
Single-Home
No-Home
Restriction on the model
Interconnection Network
Under Load Latency
Memory Subsystem
Memory Read Service Time
Memory Write Service Time
Working with Caches
Model Validation
Evaluation of Rq for store_linear
Evaluation of Rq for store_linear with a different store rate
Considerations on the accuracy of the model
Summary
III Optimizations
Exploiting Multiple Memory Controllers
Programming multi-cores
Memory allocation models
SMP-like memory allocation
NUMA-like memory allocation
Process allocation
Evaluation by mean of synthetic benchmarks
Experimental results on the target architectures
Concluding Remarks
Farm parallelization of the Sobel Operator
Experimental results on the target architectures
Concluding Remarks
Farm parallelization of the Vector Addition
Experimental results on the target architectures
Data-Parallel parallelization of the FFT
Parallel FFT
Experimental results on the target architectures
Concluding Remarks
Modeling policies in the architectural model
Summary
Software-based Cache Coherence
The cost of automatic cache coherence
Optimizing cache coherence for the farm pattern
Automatic cache coherence with hashed home node
Automatic cache coherence with fixed home node
Disabling automatic cache coherence
Experimental Results
Optimizing cache coherence for a data-parallel pattern
Automatic cache coherence with hashed home node
Automatic cache coherence with fixed home node
Disabling local caches
Disabling automatic cache coherence
Experimental Results
Summary
IV Wrapping Up
Wrapping up: compiling a parallel module on TilePro64
Example module and its application
Parallel pattern and its implementations
Parallel Patterns
Farm Implementations
Study of the message passing implementation
Architecture Model Parameters
Predicted Service Times
Study of the message passing impl. with copy on receive
Architecture Model Parameters
Predicted Service Times
Study of the pointer passing implementation
Architecture Model Parameters
Predicted Service Times
Selection of the best implementation
Impact of a multi-chip configuration
A multi-chip TilePro64 configuration
Network Latencies
Core reservation and placement on the mesh
Implementations and model parameters
Performance study
Summary
Conclusions
Bibliography


πŸ“œ SIMILAR VOLUMES


Task Scheduling for Multi-core and Paral
✍ Quan Chen,Minyi Guo (auth.) πŸ“‚ Library πŸ“… 2017 πŸ› Springer Singapore 🌐 English

<p><p>This book presents task-scheduling techniques for emerging complex parallel architectures including heterogeneous multi-core architectures, warehouse-scale datacenters, and distributed big data processing systems. The demand for high computational capacity has led to the growing popularity of

Multi-Paradigm Modelling Approaches for
✍ Bedir Tekinerdogan (editor), Dominique Blouin (editor), Hans Vangheluwe (editor) πŸ“‚ Library πŸ“… 2020 πŸ› Academic Press 🌐 English

<i>Multi-Paradigm Modelling for Cyber-Physical Systems</i> explores modeling and analysis as crucial activities in the development of Cyber-Physical Systems, which are inherently cross-disciplinary in nature and require distinct modeling techniques related to different disciplines, as well as a comm

Designing Public Policies: An Approach B
✍ Francisco J. AndrΓ©, M. Alejandro Cardenete, Carlos Romero (auth.) πŸ“‚ Library πŸ“… 2010 πŸ› Springer-Verlag Berlin Heidelberg 🌐 English

<p>This book presents a methodological approach for the joint design of economic and environmental policies. The starting point is the observation that, in practice, policy makers do not usually have a well-defined objective, but they are typically concerned about a number of economic and environmen

Optimizing Compilers for Modern Architec
✍ Randy Allen πŸ“‚ Library πŸ“… 2001 πŸ› Morgan Kaufmann 🌐 English

<p> Modern computer architectures designed with high-performance microprocessors offer tremendous potential gains in performance over previous designs. Yet their very complexity makes it increasingly difficult to produce efficient code and to realize their full potential. This landmark text from two