<p><span>This book constitutes the refereed proceedings of the 37th International Conference on High Performance Computing, ISC High Performance 2022, held in Hamburg, Germany, during May 29 – June 2, 2022.</span></p><p><span>The 18 full papers presented were carefully reviewed and selected from 53
High Performance Computing: 38th International Conference, ISC High Performance 2023, Hamburg, Germany, May 21–25, 2023, Proceedings (Lecture Notes in Computer Science)
✍ Scribed by Abhinav Bhatele (editor), Jeff Hammond (editor), Marc Baboulin (editor), Carola Kruse (editor)
- Publisher
- Springer
- Year
- 2023
- Tongue
- English
- Leaves
- 432
- Category
- Library
No coin nor oath required. For personal study only.
✦ Synopsis
This book constitutes the proceedings of the 38th International Conference on High Performance Computing, ISC High Performance 2023, which took place in Hamburg, Germany, in May 2023.
The 21 papers presented in this volume were carefully reviewed and selected from 78 submissions. They were organized in topical sections as follows: Architecture, Networks, and Storage; HPC Algorithms & Applications; Machine Learning, AI, & Quantum Computing; Performance Modeling, Evaluation, & Analysis; and Programming Environments & Systems Software.
✦ Table of Contents
Preface
Organization
Contents
Architecture, Networks, and Storage
CPU Architecture Modelling and Co-design
1 Introduction
2 Approach to Modelling
3 Methodology
4 Model Tuning and Validation
5 Applications
5.1 GROMACS
5.2 GPAW
6 Results
6.1 GROMACS
6.2 GPAW
7 Related Work
8 Summary and Conclusions
References
Illuminating the I/O Optimization Path of Scientific Applications
1 Introduction
2 Related Work
3 Visualization, Diagnosis, and Recommendations
3.1 Extracting I/O Behavior from Metrics
3.2 Exploring I/O Behavior Interactively
3.3 Automatic Detection of I/O Bottlenecks
3.4 Exploring I/O Phases and Bottlenecks
3.5 Towards Exploring File System Usage
4 Results
4.1 I/O Systems in NERSC and OLCF
4.2 I/O Bottlenecks in OpenPMD
4.3 Improving AMReX with Asynchronous I/O
5 Conclusion
References
Efficient Large Scale DLRM Implementation on Heterogeneous Memory Systems
1 Introduction
2 Related Work
3 Implementing Embedding Tables in Heterogeneous Memory Systems
4 Cached Embeddings
4.1 CachedEmbeddings Performance
5 DLRM Implementation Methodology
6 End-to-End DLRM Performance
7 Conclusions and Future Work
References
HPC Algorithms and Applications
Efficient GPU Offloading with OpenMP for a Hyperbolic Finite Volume Solver on Dynamically Adaptive Meshes
1 Introduction
2 Science Case and Code Architecture
3 A Realisation of GPU Offloads with target map
4 User-Managed Memory Management
4.1 Data Pre-allocation on the GPU
4.2 Pre-allocation on the CPU with Unified Memory
5 Results
6 Discussion and Conclusions
7 Summary and Outlook
References
Shallow Water DG Simulations on FPGAs: Design and Comparison of a Novel Code Generation Pipeline
1 Introduction
2 Background
2.1 Mathematical Model and Numerical Scheme
2.2 Simulation Scenario: Radial Dam Break
2.3 FPGAs
3 Proposed Code Generation Pipeline (CGP)
3.1 GHODDESS
3.2 pystencils
3.3 StencilStream
3.4 Integration
4 Existing Dataflow Design
5 FPGA Designs, Experiments and Evaluation
5.1 Performance of the CPU Reference and Validation
6 Analysis
7 Related Work
8 Conclusion and Outlook
References
Massively Parallel Genetic Optimization Through Asynchronous Propagation of Populations
1 Introduction
2 Related Work
3 Propulate Algorithm and Implementation
4 Experimental Evaluation
4.1 Experimental Environment
4.2 Benchmark Functions
4.3 Meta-optimizing the Optimizer
4.4 Benchmark Function Optimization
4.5 HP Optimization for Remote Sensing Classification
4.6 Scaling
5 Conclusion
References
Steering Customized AI Architectures for HPC Scientific Applications
1 Introduction
2 Related Work and Research Contributions
3 Batching/Compression or Why Matricization Matters?
4 The Graphcore IPU Hardware Technology
4.1 Architecture Principles and Hardware Details
4.2 Programming Model and Poplar Development Kit
5 HPC Scientific Applications
5.1 Adaptive Optics in Computational Astronomy
5.2 Seismic Processing and Imaging
5.3 Climate/Weather Prediction Applications
5.4 Wireless Communications
6 Implementation Details
7 Performance Results
8 Limitations and Perspectives
9 Conclusion and Future Work
References
GPU-Based Low-Precision Detection Approach for Massive MIMO Systems
1 Introduction
2 Brief Background
2.1 Modulation
2.2 Signal to Noise Ratio (SNR)
2.3 Error Rate and Time Complexity
3 Related Work
4 System Model
4.1 Tree-Based Representation
5 Multi-level Approach
6 GPU-Based Multi-level Approaches
6.1 GPU Multi-level
6.2 Multi-GPU Version
7 Results and Discussions
8 Conclusion and Perspectives
References
A Mixed Precision Randomized Preconditioner for the LSQR Solver on GPUs
1 Introduction
2 Background
2.1 Related Work
3 Design and Implementation of the Mixed Precision Preconditioner
4 Numerical Experiments
4.1 Experiment Setup
4.2 Discussion
5 Conclusion
References
Ready for the Frontier: Preparing Applications for the World's First Exascale System
1 Introduction and Background
2 Systems Overview
2.1 Summit
2.2 Frontier
3 Applications
3.1 CoMet
3.2 Cholla: Computational Hydrodynamics on Parallel Architecture
3.3 GESTS: GPUs for Extreme-Scale Turbulence Simulations
3.4 LBPM: Lattice Boltzmann Methods for Porous Media
3.5 LSMS
3.6 NUCCOR/NTCL
3.7 NAMD
3.8 PIConGPU
4 Lessons Learned
5 Conclusions
References
End-to-End Differentiable Reactive Molecular Dynamics Simulations Using JAX
1 Introduction
1.1 Related Work
1.2 Our Contribution
2 Background
2.1 ReaxFF Overview
2.2 JAX and JAX-MD Overview
3 Design and Implementation
3.1 Memory Management
3.2 Generation of Interaction Lists
3.3 Force Field Training
4 Experimental Results
4.1 Software and Hardware Setup
4.2 Validation of MD Capabilities
4.3 Performance and Scalability
4.4 Training
5 Conclusion
References
Machine Learning, AI, and Quantum Computing
Allegro-Legato: Scalable, Fast, and Robust Neural-Network Quantum Molecular Dynamics via Sharpness-Aware Minimization
1 Introduction
2 Method Innovation
2.1 Summary of Neural-Network Quantum Molecular Dynamics
2.2 Summary of Sharpness-Aware Minimization
2.3 Key Innovation: Allegro-Legato: SAM-Enhanced Allegro
2.4 RXMD-NN: Scalable Parallel Implementation of Allegro-Legato NNQMD
3 Results
3.1 Experimental Platform
3.2 Fidelity-Scaling Results
3.3 Computational-Scaling Results
4 Discussions
4.1 Simulation Time
4.2 Training Time
4.3 Model Accuracy
4.4 Implicit Sharpness Regularization in Allegro
4.5 Training Details
5 Applications
6 Related Work
7 Conclusion
References
Quantum Annealing vs. QAOA: 127 Qubit Higher-Order Ising Problems on NISQ Computers
1 Introduction
2 Methods
2.1 Ising Model Problem Instances
2.2 Quantum Alternating Operator Ansatz
2.3 Quantum Annealing
2.4 Simulated Annealing Implementation
3 Results
4 Discussion
References
Quantum Circuit Simulation by SGEMM Emulation on Tensor Cores and Automatic Precision Selection
1 Introduction
2 Background
2.1 NVIDIA Tensor Core and SGEMM Emulation
2.2 Quantum Circuit Simulation and Tensor Network Contraction
3 SGEMM Emulation Library on Tensor Cores
4 Automatic Precision Selection
4.1 Exponent Statistics and Computing Mode Selection Rule
4.2 Dynamic Kernel Selection
4.3 The Overhead of the Exponent Statistics
5 Experiment
5.1 Preparation
5.2 Exploratory Experiment
5.3 Random Quantum Circuit Simulation
6 Conclusion
References
Performance Modeling, Evaluation, and Analysis
A Study on the Performance Implications of AArch64 Atomics
1 Introduction
2 The Problem
2.1 RAJAPerf and the PI_ATOMIC kernel
2.2 Performance Results
2.3 A Closer Look at OpenMP Floating-Point Atomics
3 Benchmarking CAS Operations
3.1 Compare-and-Swap Operations
3.2 Benchmark Description
3.3 Assembly Kernels
4 Experiments and Observations
4.1 Evaluating the Performance of CAS
4.2 A Closer Look at A64FX
4.3 Testing LL-SC Implementations
4.4 Summary and Recommendations
5 Related Work
6 Conclusions
References
Analyzing Resource Utilization in an HPC System: A Case Study of NERSC's Perlmutter
1 Introduction
2 Related Work
3 Background
3.1 System Overview
3.2 Data Collection
3.3 Analysis Methods
4 Results
4.1 Workloads Overview
4.2 Resource Utilization
4.3 Temporal Characteristics
4.4 Spatial Characteristics
4.5 Correlations
5 Discussion and Conclusion
References
Overcoming Weak Scaling Challenges in Tree-Based Nearest Neighbor Time Series Mining
1 Introduction
2 Matrix Profile Background and Performance-Accuracy Trade-offs
2.1 Related Work
2.2 Potentials of Tree-based Methods
3 Current Parallel Tree-Based Approach and Its Shortcomings
4 Overcoming the Scalability Challenges
4.1 Pipelining Mechanism
4.2 Forest of Trees on Ensembles of Resources:
5 Modeling the Impact of Optimizations on Complexity
6 Experimental Setup
7 Evaluations
7.1 Region of Benefit
7.2 Performance on Real-World Datasets
7.3 Single-Node Performance
7.4 Scaling Overheads
7.5 Effects of Pipelining and Forest Mechanisms
7.6 Scaling Results
7.7 Billion Scale Experiment
8 Conclusions
References
Porting Numerical Integration Codes from CUDA to oneAPI: A Case Study
1 Introduction
2 Background
2.1 oneAPI and SYCL
2.2 CUDA-Backend for SYCL
2.3 Related Work
3 Numerical Integration Use Case
3.1 PAGANI
3.2 m-Cubes
4 Porting Process
4.1 Challenges
5 Experimental Results
5.1 Offloading Mathematical Computations to Kernels
5.2 Benchmark Integrands Performance Comparison
5.3 Simple Integrands Performance Comparison
5.4 Factors Limiting Performance
6 Conclusion
References
Performance Evaluation of a Next-Generation SX-Aurora TSUBASA Vector Supercomputer
1 Introduction
2 Overview of SX-Aurora TSUBASA VE30
2.1 The SX-Aurora TSUBASA Product Family
2.2 Basic Architecture of the VE30 Processor
2.3 Architectural Improvements from the VE20 Processor
3 Performance Evaluation
3.1 Evaluation Environment
3.2 Basic Benchmarks
3.3 Evaluation of Architectural Improvements
3.4 Real-World Workloads
4 Performance Tuning for VE30
4.1 Selective L3 Caching
4.2 Partitioning Mode
5 Conclusions
References
Programming Environments and Systems Software
Expression Isolation of Compiler-Induced Numerical Inconsistencies in Heterogeneous Code
1 Introduction
2 Examples of Compiler-Induced Inconsistencies
3 Technical Approach
3.1 Hierarchy Extraction
3.2 Hierarchical Code Isolation
3.3 Source-to-Source Precision Enhancement
4 Experimental Evaluation
4.1 RQ1: Numerical Inconsistencies in Heterogeneous Programs
4.2 RQ2: Comparison with the State of the Art
4.3 Threats to Validity
5 Related Work
6 Conclusion
References
SAI: AI-Enabled Speech Assistant Interface for Science Gateways in HPC
1 Introduction and Motivation
1.1 Motivation
1.2 Challenges in Enabling Conversational Interface for HPC
1.3 Contributions
2 Background
2.1 Conversational User Interface
2.2 Open OnDemand
2.3 Ontology and Knowledge Graphs
2.4 Spack
3 Terminologies
4 Proposed SAI Framework
4.1 Generating HPC Datasets for Speech and Text
4.2 Fine-Tuning Speech Recognition Model for HPC Terminologies
4.3 Designing an Entity Detection and Classification Model for SAI
4.4 Creating the HPC Ontology and Knowledge Graphs
4.5 Knowledge Graph Selection and Inference
4.6 Software Installer Check and Interfacing with Spack
4.7 Integration with Open OnDemand
5 Insights into SAI Usage and Explainable Flow
6 Experimental Evaluation
6.1 Evaluation Platform
6.2 Evaluation Methodology
6.3 Evaluating ASR Model
6.4 Evaluating NLU Model
6.5 Performance Evaluation of Combined ASR and NLU Models
6.6 Overhead Analysis of SAI
6.7 Overhead Analysis of Scaling Passenger App Users
6.8 Analysis of SAI Interactive App on Different Architectures
7 Discussion
7.1 Security and Authentication
7.2 Handling Ambiguous Queries in SAI
7.3 Trade-offs for Converting Speech to Entities
7.4 Portability for New Software and Systems
8 Related Work
9 Future Work
10 Conclusion
References
Author Index
📜 SIMILAR VOLUMES
<span>This volume constitutes the papers of several workshops which were held in conjunction with the 38th International Conference on High Performance Computing, ISC High Performance 2023, held in Hamburg, Germany, during May 21–25, 2023. <br>The 49 revised full papers presented in this book were c
<p>This book constitutes the refereed proceedings of the 35th International Conference on High Performance Computing, ISC High Performance 2020, held in Frankfurt/Main, Germany, in June 2020.*<p>The 27 revised full papers presented were carefully reviewed and selected from 87 submissions. The papers
<span>This book constitutes the refereed proceedings of the 36th International Conference on High Performance Computing, ISC High Performance 2021, held virtually in June/July 2021.</span><p><span>The 24 full papers presented were carefully reviewed and selected from 74 submissions. The papers cover
<p><span>This book constitutes the refereed post-conference proceedings of 9 workshops held at the 35th International ISC High Performance 2021 Conference, in Frankfurt, Germany, in June-July 2021:</span></p><p><span>Second International Workshop on the Application of Machine Learning Techniques to
<span>This book constitutes the refereed conference proceedings of the workshops held at the 37th International ISC High Performance 2022 Conference, in Hamburg, Germany, in June 2, 2022.<br>The 27 full papers were included in this book were carefully reviewed and selected from 43 submissions. <br>I