Job Scheduling Strategies for Parallel Processing: 26th Workshop, JSSPP 2023, St. Petersburg, FL, USA, May 19, 2023, Revised Selected Papers (Lecture Notes in Computer Science)

✍ Scribed by Dalibor Klusáček (editor), Julita Corbalán (editor), Gonzalo P. Rodrigo (editor)

Publisher: Springer
Year: 2023
Tongue: English
Leaves: 200
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

This book constitutes the thoroughly refereed post-conference proceedings of the 26th International Workshop on Job Scheduling Strategies for Parallel Processing, JSSPP 2023, held in St. Petersburg, FL, USA, during May 19, 2023.

The 8 full papers and one keynote paper included in this book were carefully reviewed and selected from 14 submissions. The volume contains two sections: keynote and technical papers.

✦ Table of Contents

Preface
Organization
Contents
Keynote
Architecture of the Slurm Workload Manager
1 Introduction
2 Slurm Entities
3 Daemons
4 Plugin Infrastructure
5 Configuration
6 Communications
7 Job Priority
8 Typical Configurations
9 Scheduling Algorithm
10 License Scheduling
11 Application Layout
12 Job Profiling
13 Compute Node Management
14 Conclusion
References
Technical Papers
Asynchronous Execution of Heterogeneous Tasks in ML-Driven HPC Workflows
1 Introduction
2 Related Work
3 Motivation
4 Design and Implementation
5 Workload-Level Asynchronicity
5.1 Condition I: Inter-Task Dependencies
5.2 Condition II: Resource Availability
5.3 Benefits of Workflow-Level Asynchronicity
6 Experiments
6.1 DeepDriveMD
6.2 Abstract-DG
7 Performance Characterization
7.1 DeepDriveMD
7.2 C-DG1
7.3 C-DG2
8 Conclusions
References
Memory-Aware Latency Prediction Model for Concurrent Kernels in Partitionable GPUs: Simulations and Experiments
1 Introduction
2 Background
2.1 GPU Architecture
2.2 Programming Model and Scheduling
2.3 Cycle Accurate Simulation Through GPGPU-Sim
3 Overview
4 Simulation Settings and Our GPGPU-Sim Extension
5 Memory Aware Performance Estimation
5.1 Kernels Memory Bandwidth Analysis
5.2 Kernels Completion Latency Analysis
6 Predicting Latencies Depending on Assigned SMs
7 Modeling Memory Interference
7.1 Experiments and Analysis
7.2 Comparison with a Worst-Interference Method
7.3 Latency and Interference Prediction Evaluations
8 Bandwidth Prediction
8.1 Bandwidth Prediction Results
9 Related Work
10 Conclusion
References
Stragglers in Distributed Matrix Multiplication
1 Introduction
1.1 Our Contribution
1.2 Related Work
1.3 Paper Organization
2 Preliminaries
2.1 Models and Architecture
2.2 Matrix Multiplication
2.3 Collective Communication Operations
3 Synchronized Load Balancing
3.1 Task Exchange Phase
3.2 Adaptive Task Exchange Procedure
4 Comparison
4.1 Simulation
5 Discussion
A Existing Solutions
A.1 Dynamic Load Balancing
A.2 Redundancy
References
Optimization Metrics for the Evaluation of Batch Schedulers in HPC
1 Introduction
2 Evaluating the Quality of a Schedule
2.1 Mean (bounded) Slowdown
2.2 Utilization
2.3 Response Time (and Wait Time)
2.4 Additional Comments
3 Use-Case: The Impact of Runtime Estimates
3.1 Evaluation Methodology
3.2 Experimental Evaluation
4 Related Work
5 Conclusion
References
An Experimental Analysis of Regression-Obtained HPC Scheduling Heuristics
1 Introduction
2 Related Work
3 Background
3.1 Online Parallel Job Scheduling
3.2 Scheduling Policies and Backfilling
3.3 Scheduling Performance Metric
4 Experimental Procedure
4.1 Simulation Strategy
4.2 Creating Regression-Based Scheduling Heuristics
5 Results and Discussion
5.1 Simulation-Based Approach for Extracting Scheduling Knowledge
5.2 Does the Effectiveness of Regression-Based Scheduling Heuristics Increases as a Function of Polynomial Size?
5.3 How Regression-Obtained Scheduling Heuristics Behave in Long Term?
6 Conclusions and Future Work
References
An Efficient Approach Based on Graph Neural Networks for Predicting Wait Time in Job Schedulers
1 Introduction
2 Related Work
3 Datasets
3.1 Prediction Class Definition
3.2 Input Variables
4 Proposed DL Model
5 Results and Discussion
5.1 Comparison with Other Methods
5.2 Time Dependency
5.3 Importance of Input Variables
5.4 Visualization of Attention Weights
6 Conclusions
References
Evaluating the Potential of Coscheduling on High-Performance Computing Systems
1 Introduction
2 Background and Assumptions
2.1 Traditional HPC Job Scheduling
2.2 Evaluating Scheduling Policies
2.3 Backfilling
2.4 Job Configuration Assumptions
3 Implementation
3.1 Backfilling
3.2 Coscheduling
4 Experimental Setup
5 Results
5.1 Turnaround Time Results
5.2 Impact on Individual Job Execution Times
5.3 Differing Numbers of Nodes
5.4 Restricting Coscheduling
6 Related Work
7 Summary
References
Scaling Optimal Allocation of Cloud Resources Using Lagrange Relaxation
1 Introduction
2 Background: Cloud Computing Model
3 Problem Formulation
3.1 Unit Commitment Problem
3.2 Integer Linear Program Formulation
4 Lagrange Relaxation
4.1 Boundary Analysis
4.2 Normalized Average Cost Analysis
5 Decomposition of Forecasted Demand
5.1 Decomposition Based Approximation Algorithm
6 An Illustrative Case Study
6.1 Decomposition of the Forecasted Demand
7 Experimental Results and Discussion
8 Related Work
9 Summary and Future Work
References
Author Index

📜 SIMILAR VOLUMES