Euro-Par 2022: Parallel Processing: 28th International Conference on Parallel and Distributed Computing, Glasgow, UK, August 22–26, 2022, Proceedings (Lecture Notes in Computer Science)

✍ Scribed by José Cano (editor), Phil Trinder (editor)

Publisher: Springer
Year: 2022
Tongue: English
Leaves: 443
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

This book constitutes the proceedings of the 33rd International Conference on Parallel and Distributed Computing, Euro-Par 2022, held in GLasgow, UK, in August 2022.

The 25 full papers presented in this volume were carefully reviewed and selected from 102 submissions. The conference Euro-Par 2022 covers all aspects of parallel and distributed computing, ranging from theory to practice, scaling from the smallest

to the largest parallel and distributed systems, from fundamental computational problems and models to full-fledged applications, from architecture and interface design and implementation to tools, infrastructures and applications.

✦ Table of Contents

Preface
Organization
Euro-Par 2022 Invited Talks
Living in a Heterogenous World: How Scientific Workflows Help Automate Science and What We Can Do Better?
Effective Congestion Management for Large-Scale Datacenters
Programming Big Data Analysis: Towards Data-Centric Exascale Computing
Euro-Par 2022 Topic Overviews
Topic 1: Compilers, Tools, and Environments
Topic 2: Performance and Power Modeling, Prediction and Evaluation
Topic 3: Scheduling and Load Balancing
Topic 4: Data Management, Analytics, and Machine Learning
Topic 5: Cluster and Cloud Computing
Topic 6: Theory and Algorithms for Parallel and Distributed Processing
Topic 7: Parallel and Distributed Programming, Interfaces, and Languages
Topic 8: Multicore and Manycore Parallelism
Topic 9: Parallel Numerical Methods and Applications
Contents
Compilers, Tools and Environments
CrossDBT: An LLVM-Based User-Level Dynamic Binary Translation Emulator
1 Introduction
2 Related Work
3 DBT Emulator
3.1 Overview
3.2 Offline Stage
3.3 Online Stage
3.4 SIMD and Floating Point
3.5 System Call
4 Experiment Results
4.1 Setup
4.2 Results
5 Conclusion
References
MARTINI: The Little Match and Replace Tool for Automatic Application Rewriting with Code Examples
1 Introduction
2 Design and Implementation
3 Case Study: Instrumentation
4 Case Study: HIPIFY
5 Evaluation
5.1 Performance
5.2 Usability
6 Related Work
7 Future Work
8 Conclusion
References
Accurate Fork-Join Profiling on the Java Virtual Machine
1 Introduction
2 Background
3 Profiling Technique
3.1 Profiling Model
3.2 Implementation
4 Evaluation
4.1 Experimental Setup
4.2 Accuracy and Overhead Evaluation
5 Related Work
6 Conclusions
References
Performance and Power Modeling, Prediction and Evaluation
Characterization of Different User Behaviors for Demand Response in Data Centers
1 Introduction
2 Background and Related Works
3 Model
3.1 Data Center
3.2 Scheduler
3.3 Users
4 Experimental Setup
4.1 Software Used for Simulation
4.2 Workload
4.3 Platform
4.4 Experimental Campaign
5 Results
5.1 Energy Metrics
5.2 Perceived Impact on the Scheduling
6 Discussion
6.1 The Fluid-Residual Ratio: An Explanation of the Results
6.2 Pros and Cons of Each Behavior
6.3 Interactions with Scheduling Systems
6.4 Limitations
7 Conclusion and Future Works
References
On-the-Fly Calculation of Model Factors for Multi-paradigm Applications
1 Introduction
2 Model Factors
3 Critical Path
4 On-the-Fly Critical Path Analysis
4.1 Measuring Time
4.2 Critical Path in OpenMP
4.3 Critical Path in MPI
4.4 Implementation Challenges
5 Hybrid Model Factors
5.1 Definition of Separated Model Factors
5.2 Properties of Separated Model Factors
6 Evaluation
6.1 Experiment Setup
6.2 Real-World Application JuKKR
6.3 Distributed Block Cholesky Factorization
6.4 Synthetic Benchmark
7 Future Work
7.1 Region-based Analysis
7.2 Accelerator Support
8 Conclusions
References
Relative Performance Projection on Arm Architectures
1 Introduction
2 Related Work
3 Workflow Presentation and Implementation
3.1 Hardware and Software Characterization
3.2 Performance Projection
3.3 Methodology for Design-Space Exploration
3.4 Implementation
4 Experimental Environment
4.1 Architectures
4.2 Applications
5 Model Validation
5.1 LULESH
5.2 MiniFE
5.3 Quicksilver
6 Exploration on Different Parameters
6.1 Exploration on SVE vector sizes
6.2 Exploration on the Introduction of HBM2 on DDR4 Machines
6.3 Comparison of Projections from N1 and TX2 with SVE 512 and HBM2 to A64FX
6.4 Vector Sizes Exploration on A64FX with Different Software Stacks
7 Conclusion and Future Work
References
Scheduling and Load Balancing
Exploring Scheduling Algorithms for Parallel Task Graphs: A Modern Game Engine Case Study
1 Introduction
2 Background and Related Work
2.1 Background on Video Games and Game Engines
2.2 Scheduling Problems and Algorithms
3 Scheduling in Game Engines
3.1 Task Model
3.2 Scheduling Problem for a Single Frame
3.3 Scheduling Problem for Multiple Frames
4 Exploring List Scheduling Algorithms
5 Experimental Evaluation
5.1 Details Regarding the Simulation and Statistical Evaluation
5.2 Scenario I - Employing Scheduling Algorithms
5.3 Scenario II - Subtask Scheduling
5.4 Scenario III - Subtask Splitting
6 Conclusion and Future Work
References
Decentralized Online Scheduling of Malleable NP-hard Jobs
1 Introduction
2 Preliminaries
2.1 Malleable Job Scheduling
2.2 Scalable SAT Solving
2.3 Problem Statement
3 Approach
3.1 Calculation of Fair Volumes
3.2 Assignment of Jobs to PEs
3.3 Reuse of Suspended Workers
4 The Mallob System
4.1 Implementation of Algorithms
4.2 Engineering
5 Evaluation
5.1 Uniform Jobs
5.2 Impact of Priorities
5.3 Realistic Job Arrivals
6 Conclusion
References
A Bi-Criteria FPTAS for Scheduling with Memory Constraints on Graphs with Bounded Tree-Width
1 Introduction
2 Definitions
3 An Exact Algorithm Using Dynamic Programming
3.1 The Dynamic Programming Algorithm
3.2 Algorithm Correctness
3.3 Algorithm Complexity
4 Application of a Trimming Technique
5 Conclusion
References
Data Management, Analytics and Machine Learning
mCAP: Memory-Centric Partitioning for Large-Scale Pipeline-Parallel DNN Training
1 Introduction
2 Background and Related Work
3 Method
3.1 Profiling
3.2 Peak Memory Usage for Intra-batch Pipelining with Activation Recomputation
3.3 Prediction
3.4 Recommendation
3.5 Implementation
4 Experiments
4.1 Experimental Setup
4.2 VGG11
4.3 AmoebaNet-D
5 Conclusion and Future Work
References
Analysing Supercomputer Nodes Behaviour with the Latent Representation of Deep Learning Models
1 Introduction
2 Related Work
3 Methodology
3.1 Probabilistic Background
3.2 General Overview of the Approach
3.3 Autoencoder Models
3.4 Feature Extraction
3.5 Clustering
3.6 Evaluating Clustering
3.7 Random Sampling Baseline
4 Results
4.1 Experimental Setting
4.2 Trained Autoencoder
4.3 Cluster Analysis: Normal Operation Percentage
5 Conclusions
References
Accelerating Parallel Operation for Compacting Selected Elements on GPUs
1 Introduction
2 Preliminaries
2.1 Compaction Primitive
2.2 Bit Mask Characteristics
2.3 NVIDIA GPU Architecture and Execution Model
3 SPACE - Accelerating Compaction on GPUs
3.1 Phase Variants for Parallel Implementation
3.2 Overview of SPACE Variants
4 Evaluation
4.1 Implementation
4.2 Experimental Setup
4.3 Experimental Methodology
4.4 Results
5 Related Work
6 Conclusion and Summary
References
Cluster and Cloud Computing
A Methodology to Scale Containerized HPC Infrastructures in the Cloud
1 Introduction
2 Related Works
3 Methodology
3.1 Required Information at a User Level
3.2 Configuration Services
4 Experimentations
4.1 Outline
4.2 Micro Description of the Methodology
4.3 Experimental Results
4.4 Impact on Pending Jobs
4.5 Impact on Running Jobs
4.6 Short-term Upcoming Perspectives
5 Conclusion and Long-Term Perspectives
References
Cucumber: Renewable-Aware Admission Control for Delay-Tolerant Cloud and Edge Workloads
1 Introduction
2 Related Work
3 Admission Control
3.1 Forecasting Load, Power Consumption, and Power Production
3.2 Deriving the freep Capacity Forecast
3.3 Admission Control Policy
3.4 Limiting Power Consumption at Runtime
4 Evaluation
4.1 Experimental Setup
4.2 Results
5 Conclusion
References
Multi-objective Hybrid Autoscaling of Microservices in Kubernetes Clusters
1 Introduction
2 Related Work
3 Multi-Objective Hybrid Autoscaling
3.1 Automatic Generation of Dataset
3.2 Model Training
3.3 Autoscaling Loop
4 Evaluation
4.1 Experimental Setup
4.2 The Benchmark Setup
4.3 Experimental Results
5 Conclusion
References
Theory and Algorithms for Parallel and Distributed Processing
Two-Agent Scheduling with Resource Augmentation on Multiple Machines
1 Introduction
1.1 Problem Description
1.2 Contributions and Organization
2 Related Work
3 Problem Formulation and Notations
4 A Lower Bound
5 An Algorithm for the Two-Agent Problem
6 Analysis by Dual Fitting
6.1 Algorithm's Flow-Time
6.2 Linear Programming Formulation
6.3 Dual Variables
6.4 Competitive Analysis
7 Concluding Remarks
References
IP.LSH.DBSCAN: Integrated Parallel Density-Based Clustering Through Locality-Sensitive Hashing
1 Introduction
2 Preliminaries
3 The Proposed IP.LSH.DBSCAN Method
4 Analysis
5 Evaluation
6 Other Related Work
7 Conclusions
References
GraphGuess: Approximate Graph Processing System with Adaptive Correction
1 Introduction
2 Graph Processing Systems
2.1 Think Like a Vertex
2.2 Preprocessing the Graph
2.3 Approximate Analysis
2.4 When Graph Approximation Fails
3 GraphGuess
3.1 Programming Model
3.2 Tracking the Edge Influence
3.3 Runtime Modes
3.4 Adaptive Correction
4 Applications and Error Criteria
4.1 Applications and Datasets
4.2 Error Metrics
5 Experimental Evaluations
5.1 Sensitivity to Control Parameters
5.2 Evaluation of Performance and Accuracy
6 Concluding Remarks
References
Deterministic Parallel Hypergraph Partitioning
1 Introduction
2 Preliminaries
3 Deterministic Parallel Multilevel Partitioning
3.1 Preprocessing
3.2 Coarsening
3.3 Initial Partitioning
3.4 Refinement
3.5 Differences to BiPart
4 Experiments
4.1 Parameter Tuning
4.2 Speedups
4.3 Comparison with Other Algorithms
4.4 The Cost of Determinism
5 Conclusion and Future Work
References
Parallel and Distributed Programming, Interfaces, and Languages
OmpSs-2@Cluster: Distributed Memory Execution of Nested OpenMP-style Tasks
1 Introduction
2 Background
3 OmpSs-2@Cluster Programming Model
4 Nanos6 Runtime Implementation
4.1 Building the Distributed Dependency Graph
4.2 Tracking Dependencies Among Tasks
4.3 Scheduling Ready Tasks for Execution
4.4 Data Transfers
5 Evaluation Methodology
5.1 Hardware and Software Platform
5.2 Benchmarks
6 Results
7 Related Work
8 Conclusions
References
Generating Work Efficient Scan Implementations for GPUs the Functional Way
1 Introduction
2 Background
2.1 Parallel Scan
2.2 Data Parallel Functional Code Generators
3 Functional Formulation of Work Efficient Parallel Scan
3.1 Outer Scan
3.2 Inner Scan
4 Modeling the Optimization Space with Rewrite Rules
4.1 Optimization via Rewrite Rules
4.2 Algorithmic Optimization
5 Optimization Space Exploration
5.1 Expressing Scan Variants
5.2 Exploring Scan Variants
6 Evaluation
6.1 Performance of Scan Block Variants
6.2 End-to-End Comparison
7 Related Work
8 Conclusion
References
Automatic Parallelization of Python Programs for Distributed Heterogeneous Computing
1 Introduction
2 Background
2.1 Intrepydd Compiler
2.2 Ray Runtime
3 Overview of Our Approach
4 Optimizations
4.1 Program Multi-versioning for Specialized Code Optimizations
4.2 Polyhedral Optimizations
4.3 NumPy-to-CuPy Conversion and Parallelized Code Generation
4.4 Important Packages Used in AutoMPHC Tool Chain
5 Experimental Results
5.1 Experimental Setup
5.2 Single-node Results (PolyBench)
5.3 Multi-node Results (STAP)
6 Related Work
7 Conclusions
References
Multicore and Manycore Parallelism
A Hybrid Piece-Wise Slowdown Model for Concurrent Kernel Execution on GPU
1 Introduction
2 Motivation
3 Slowdown Model for SMK
3.1 Hybrid Slowdown Model
3.2 Piece-wise Model for Compute-Bound Kernels
4 Experimental Results
4.1 Simulated System
4.2 Workloads
4.3 SMT Vs SMK Comparison
4.4 Fairness Based Policy
5 Related Works
6 Conclusion and Future Work
References
Parallel Numerical Methods and Applications
Accelerating Brain Simulations with the Fast Multipole Method
1 Introduction
2 Related Work
3 Background
3.1 The Model of Structural Plasticity
3.2 A Distributed Octree
3.3 Mathematical Formulation of the Fast Multipole Method
4 Algorithm Description
4.1 Complexity
5 Evaluation
6 Conclusion
References
High-Performance Spatial Data Compression for Scientific Applications
1 Introduction
2 Background
3 Related Work
4 Hierarchical Low-Rank Data Compression
4.1 Problem Definition and Adaptive Procedure
4.2 Error Bounds
4.3 Runtime Complexity
4.4 Implementation Details
5 Performance Results
5.1 Logarithmic Kernel
5.2 The Wave Equation
5.3 Turbulent Combustion
6 Conclusion and Future Work
References
Author Index

📜 SIMILAR VOLUMES

Euro-Par 2022: Parallel Processing: 28th

📁 Euro-Par 2022: Parallel Processing: 28th International Conference on Parallel and Distributed Computing, Glasgow, UK, August 22–26, 2022, Proceedings (Lecture Notes in Computer Science, 13440)

✍ José Cano (editor), Phil Trinder (editor) 📂 Library 📅 2022 🏛 Springer 🌐 English

This book constitutes the proceedings of the 33rd International Conference on Parallel and Distributed Computing, Euro-Par 2022, held in Vienna, Austria, in August 2022.The 25 full papers presented in this volume were carefully reviewed and selected from 102 submissions. The co

Euro-Par 2020: Parallel Processing: 26th

📁 Euro-Par 2020: Parallel Processing: 26th International Conference on Parallel and Distributed Computing, Warsaw, Poland, August 24–28, 2020, Proceedings

✍ Maciej Malawski, Krzysztof Rzadca 📂 Library 📅 2020 🏛 Springer International Publishing;Springer 🌐 English

This book constitutes the proceedings of the 26th International Conference on Parallel and Distributed Computing, Euro-Par 2020, held in Warsaw, Poland, in August 2020. The conference was held virtually due to the coronavirus pandemic.The 39 full papers presented in this volume were car

Euro-Par 2023: Parallel Processing: 29th

📁 Euro-Par 2023: Parallel Processing: 29th International Conference on Parallel and Distributed Computing, Limassol, Cyprus, August 28 – September 1, ... (Lecture Notes in Computer Science, 14100)

✍ José Cano (editor), Marios D. Dikaiakos (editor), George A. Papadopoulos (editor 📂 Library 📅 2023 🏛 Springer 🌐 English

This book constitutes the proceedings of the 29th International Conference on Parallel and Distributed Computing, Euro-Par 2023, held in Limassol, Cyprus, in August/September 2023.The 49 full papers presented in this volume were carefully reviewed and selected from 164 s

Euro-Par 2002. Parallel Processing: 8th

📁 Euro-Par 2002. Parallel Processing: 8th International Euro-Par Conference Paderborn, Germany, August 27-30, 2002 Proceedings (Lecture Notes in Computer Science, 2400)

✍ Burkhard Monien (editor), Rainer Feldmann (editor) 📂 Library 📅 2002 🏛 Springer 🌐 English

Euro-Par – the European Conference on Parallel Computing – is an international conference series dedicated to the promotion and advancement of all aspects of parallel computing. The major themes can be divided into the broad categories of hardware, software, algorithms, and applications for pa

Euro-Par 2001 Parallel Processing: 7th I

📁 Euro-Par 2001 Parallel Processing: 7th International Euro-Par Conference Manchester, UK August 28-31, 2001 Proceedings (Lecture Notes in Computer Science, 2150)

✍ Rizos Sakellariou (editor), John Keane (editor), John Gurd (editor), Len Freeman 📂 Library 📅 2001 🏛 Springer 🌐 English

Euro-Par 2016: Parallel Processing: 22nd

📁 Euro-Par 2016: Parallel Processing: 22nd International Conference on Parallel and Distributed Computing, Grenoble, France, August 24-26, 2016, Proceedings

✍ Pierre-François Dutot, Denis Trystram (eds.) 📂 Library 📅 2016 🏛 Springer International Publishing 🌐 English

This book constitutes the refereed proceedings of the 22nd International Conference on Parallel and Distributed Computing, Euro-Par 2016, held in Grenoble, France, in August 2016. The 47 revised full papers presented together with 2 invited papers and one industrial paper were carefully revie