<p><span>The two-volume set LNCS 14388 and 14389 constitutes the refereed proceedings of the 9th Russian Supercomputing Days International Conference (RuSCDays 2023) held in Moscow, Russia, during September 25-26, 2023.</span></p><p><span>The 44 full papers and 1 short paper presented in these proce
Supercomputing: 8th Russian Supercomputing Days, RuSCDays 2022, Moscow, Russia, September 26–27, 2022, Revised Selected Papers (Lecture Notes in Computer Science)
✍ Scribed by Vladimir Voevodin (editor), Sergey Sobolev (editor), Mikhail Yakobovskiy (editor), Rashit Shagaliev (editor)
- Publisher
- Springer
- Year
- 2022
- Tongue
- English
- Leaves
- 713
- Category
- Library
No coin nor oath required. For personal study only.
✦ Synopsis
This book constitutes the refereed proceedings of the 8th Russian Supercomputing Days on Supercomputing, RuSCDays 2022, which took place in Moscow, Russia, in September 2022.
The 49 full papers and 1 short paper presented in this volume were carefully reviewed and selected from 94 submissions. The papers are organized in the following topical sections: Supercomputer Simulation; HPC, BigData, AI: Architectures, Technologies, Tools; Distributed and Cloud Computing.
✦ Table of Contents
Preface
Organization
Contents
Supercomputer Simulation
A Time-Parallel Ordinary Differential Equation Solver with an Adaptive Step Size: Performance Assessment
1 Introduction
2 Theory
2.1 Classical Parareal Algorithm
2.2 Theoretical Performance of the Parareal Algorithm
2.3 Modifications of the Parareal Algorithm
2.4 Implementation Details
2.5 Illustration of the Influence of the Adaptive Step on the Convergence Rate
3 Numerical Results
3.1 Rotational Dynamics
3.2 FitzHugh-Nagumo Model
3.3 Rossler System
3.4 Matrix Riccati Equation
3.5 Performance Issues
4 Summary
References
Analysis and Elimination of Bottlenecks in Parallel Algorithm for Solving Global Optimization Problems
1 Introduction
2 Global Optimization Problem Statement
3 Parallel Global Search Algorithm
3.1 Computational Scheme of the Algorithm
3.2 Theoretical Estimates of the Speedup of the Parallel Algorithm
4 Local Optimization Methods
4.1 BFGS Method
4.2 Hooke-Jeeves Method
4.3 Parallelization of the Local Optimization Methods
5 Results of Numerical Experiments
References
Analysis of Parallel Algorithm Efficiency for Numerical Solution of Mass Transfer Problem in Fractured-Porous Reservoir
1 Introduction
2 Problem Statement
3 Difference Scheme for Solving the Filtering Problem
4 Parallel Realization
5 Calculation Results
6 Conclusion
References
Black-Scholes Option Pricing on Intel CPUs and GPUs: Implementation on SYCL and Optimization Techniques
1 Introduction
2 Black-Scholes Option Pricing
3 Test Infrastructure
4 C++ Implementation and Optimization
4.1 Baseline
4.2 Loop Vectorization
4.3 Precision Reduction
4.4 Parallelism
4.5 NUMA-Friendly Memory Allocation
5 Porting to DPC++
5.1 Baseline and Optimizations on CPUs
5.2 Experiments on GPUs
5.3 Memory Management
5.4 Several Common Optimization Tricks
6 Conclusion
References
CFD Simulations on Hybrid Supercomputers: Gaining Experience and Harvesting Problems
1 Introduction
2 Mathematical Basis and Parallel Framework
2.1 Mathematical Model and Numerical Method
2.2 Parallel Software Framework
3 Hardware Architecture Issues
3.1 CPUs
3.2 GPUs
3.3 Hybrid CPU+GPU Cluster Nodes
4 Small Things Grow Big
4.1 Processing of Small Subsets
4.2 Processing of Results and IO
4.3 Modifiability, Reliability, Maintainability
5 Applications
6 Conclusions
References
Development of Web Environment for Modeling the Processes of Macroscopic and Microscopic Levels for Solving Conjugate Problems of Heat and Mass Transfer
1 Introduction
2 Architecture
2.1 General Approach
2.2 Technology Stack
3 Web Environment Prototype
3.1 User Access
3.2 Applications and Scenarios
3.3 System for Interaction with a Remote Computational Resource
3.4 Projects and Calculations
4 Environment Components
4.1 Specifying the Initial Data and Geometry of the Computational Domain
4.2 Generation of Grids and/or Other Modeling Objects
4.3 Preparing for Parallel Computing
4.4 Calculation Cores
4.5 Post-Processing and Visualization of Results
5 Common User-Case and Results Analyzing
6 Conclusion
References
Distributed Parallel Bootstrap Adaptive Algebraic Multigrid Method
1 Introduction
2 Bootstrap Adaptive Algebraic Multigrid Method
3 Parallel Implementation
3.1 Matrix Operations
3.2 Coarse Space Selection
3.3 Gauss–Seidel Smoother
4 Numerical Experiments
4.1 Poisson Systems
4.2 Mimetic Finite Difference System
4.3 Two-Phase Oil Recovery
5 Conclusions
References
GPU-Based Algorithm for Numerical Simulation of CO2 Sorption
1 Introduction
2 Statement of the Problem
3 Numerical Methods
3.1 Level-Set
3.2 Stokes Equation
3.3 The Solution of the Equations of Advection-Diffusion-Reaction
4 Algorithm Implementation
5 Numerical Experiments
5.1 Parameters Calibration
5.2 CO2 Break-Through Time
6 Conclusions
References
Heterogeneous Computing Systems in Problems of Modeling Filaments Formation and Pre-stellar Objects
1 Introduction
2 Implementation of Parallelized Numerical Code
3 Results of Modeling for Two Case of Cloud-Cloud Collision
4 Conclusion
References
High-Performance Computing in Solving the Electron Correlation Problem
1 Introduction
2 Parallel Implementation of KQMC
3 Analysis of the Results
4 Application of the Correlation Function
5 Conclusions
References
Implementation of Discrete Element Method to the Simulation of Solid Materials Fracturing
1 Introduction
2 Discrete Element Method
2.1 Forces
2.2 Time Integration
2.3 Boundary Conditions
2.4 Output Parameters
3 Implementation of the Algorithm
4 Numerical Experiments
5 Conclusions
References
Information Entropy Initialized Concrete Autoencoder for Optimal Sensor Placement and Reconstruction of Geophysical Fields
1 Introduction
2 Information Entropy Approximation
2.1 Information Entropy
2.2 Density Estimation Using Conditional PixelCNN
3 Optimization of Sensors Locations and Reconstruction of Fields
3.1 Concrete Autoencoder
3.2 Concrete Autoencoder with Adversarial Loss
4 Numerical Experiments
4.1 Dataset Description
4.2 Baselines
4.3 Evaluation Metrics
4.4 Results
5 Conclusion
References
Microwave Radiometric Mapping of Broken Cumulus Cloud Fields from Space: Numerical Simulations
1 Introduction
2 Size Distributions and Geometry of Broken Cloud Fields
2.1 The Plank Model
2.2 Alternative Model
2.3 The Cloud Height Distribution
3 Radiative Properties of the Clouds and the Atmosphere
4 Microwave Thermal Radiation Simulation
5 Parallelization Efficiency Analysis
6 Conclusions and Remarks
References
Parallel Computations by the Grid-Characteristic Method on Chimera Computational Grids in 3D Problems of Railway Non-destructive Testing
1 Introduction
2 Mathematical Model
3 Computational Algorithm
4 Numerical Method
5 Computational Meshes
6 Algorithms for the Distribution of Computational Grids Among Processes
6.1 Greedy Method for a Large Number of Processes
6.2 Minimal Cross Section Algorithm for Parallel Mesh Partitioning
7 Scalability Testing
8 Conclusions
References
Parallel Computing in Solving the Problem of Interval Multicriteria Optimization in Chemical Kinetics
1 Introduction
2 Mathematical Model
3 Interval Mathematical Model of the Catalytic Reaction for the Synthesis of Benzylalkyl Ethers
4 Statement of the Multi-criteria Interval Optimization Problem
5 Multi-criteria Interval Optimization Problem for the Catalytic Reaction of Benzylbutyl Ether Synthesis
6 Parallel Scheme for Implementing the Computational Process
7 Research Results
8 Conclusion
References
Parallel Efficiency for Poroelasticity
1 Poroelasticity Problem
1.1 Mathematical Formulation
1.2 Discretization
2 Solution Strategies for Discrete Systems
2.1 Monolithic Strategy
2.2 Fixed-Strain Splitting Strategy
3 Numerical Experiments
3.1 Implementation Details
3.2 Problem A: Faulted Reservoir
3.3 Problem B: Real-Life Domain with Synthetic Elastic Parameters
4 Conclusion
References
Parallel Implementation of Fast 3D Travel Time Tomography for Depth-Velocity Model Building in Seismic Exploration Problems
1 Introduction
2 Reflection Tomography Using Time Migrated Images
2.1 General Tomography Scheme
2.2 Inversion Tomographic Kernel
3 Parallel Implementation of Tomography
3.1 Parallelization Schemes for Data Preparation and Matrix Construction
3.2 Matrix Inversion Parallelization
4 Numerical Examples
4.1 Pseudo 3D Real Data
4.2 3D Real Data from Eastern Siberia
5 Conclusions
References
Parallel Implementation of Multioperators-Based Scheme of the 16-th Order for Three-Dimensional Calculation of the Jet Flows
1 Introduction
2 Problem Formulation
3 Numerical Method
4 Parallel Implementation and Efficiency Estimation
5 Numerical Results
6 Conclusion
References
Parallel Implementation of the Seismic Sources Recovery in Randomly Heterogeneous Media
1 Introduction
2 Statement of the Problem
3 Source Recovery Algorithm
4 Numerical Simulations
5 Parallel Approach
6 Conclusion
References
Performance Analysis of GPU-Based Code for Complex Plasma Simulation
1 Introduction
2 Methods
3 Performance and Efficiency Analysis
4 Conclusion
References
PIConGPU on Desmos Supercomputer: GPU Acceleration, Scalability and Storage Bottleneck
1 Introduction
2 PIConGPU
3 Hardware
4 Models
5 Discussion
6 Related Work
7 Conclusion
References
Quasi-one-Dimensional Polarized Superfluids: A DMRG Study
1 Introduction
2 Model and Methods
3 Numerical Simulations
4 Results and Discussion
References
Sintering Simulation Using GPU-Based Algorithm for the Samples with a Large Number of Grains
1 Introduction
2 Model
2.1 Finite Difference Scheme
3 Algorithm
3.1 Changes in the Finite Difference Scheme
3.2 Domains Tracking
3.3 GPU-Based Implementation
4 Results
4.1 Comparison of Solutions
4.2 Large Samples
4.3 Conclusion
References
Software Package for High-Performance Computations in Airframe Assembly Modeling
1 Introduction
2 Temporary Fastening
3 Variation Analysis
3.1 Verification of Fastening Pattern
3.2 Initial Gap Modeling
3.3 Efficiency of Task Parallelization for Verification
4 Fastener Pattern Optimization
4.1 Fastener Number Optimization
4.2 Fastener Order Optimization
5 Application Examples for Fastener Pattern Optimization
5.1 Optimization of A350 Fuselage Section Manufacturing Process
5.2 Fuselage-to-Fuselage Joint
6 Conclusion
References
State-of-the-Art Molecular Dynamics Packages for GPU Computations: Performance, Scalability and Limitations
1 Introduction
2 Software
2.1 LAMMPS
2.2 OpenMM
3 Hardware
4 Results and Discussion
4.1 Comparison of GPU and KOKKOS Backends of LAMMPS
4.2 Efficiency of GPU Acceleration in the Largest System Size Limit
5 Related Work
6 Conclusions
References
Supercomputer Simulations of Turbomachinery Problems with Higher Accuracy on Unstructured Meshes
1 Introduction
2 Mathematical Model and Numerical Method
3 Mixing Plane
4 Parallel Algorithm
5 Performance and Applications
6 Conclusions
References
Validation of Quantum-Chemical Methods with the New COSMO2 Solvent Model
1 Introduction
2 Materials and Methods
2.1 Quantum Quasi-docking
2.2 FLM Program
2.3 PM6-D3H4X and PM7 Methods
2.4 COSMO Solvent Model
2.5 Protein-Ligand Binding Enthalpy
2.6 Index of Near Native
2.7 Test Set of Protein-Ligand Complexes
3 Results
3.1 Positioning Accuracy
3.2 Binding Enthalpy
3.3 Discussion of the Results
4 Conclusions
References
HPC, BigData, AI: Architectures, Technologies, Tools
Data-Based Choice of the Training Dataset for the Numerical Dispersion Mitigation Neural Network
1 Introduction
2 NDM-net's Basic Aspects
2.1 Seismic Modelling
2.2 NDM-net
3 Construction of the Training Dataset
3.1 Analysis of Seismogramms
3.2 Equidistantly Distributed Sources to Generate Training Dataset
3.3 NRMS-Preserving Datasets
4 Conclusions
References
Deep Machine Learning Investigation of Phase Transitions
1 Introduction
2 Data Sets for Spin Models
3 Machine Learning
3.1 Basics
3.2 Architectures
3.3 Training Pipeline
4 Exponents Estimation
5 Discussion
6 Conclusions and Outlook
References
Educational and Research Project ``Optimization of the Sugar Beet Processing Schedule''
1 Introduction
2 Materials and Methods
2.1 Problem of Optimal Beet Processing Schedule
2.2 Mathematical Formalization of the Problem
2.3 Solution Methods
2.4 Analytical Solutions
2.5 Approximate Solutions
2.6 Software
3 Content of the Project
3.1 Purpose of the Educational and Research Project
3.2 Tasks of the Theoretical Part
3.3 Mastering the Software
3.4 Tasks for the Computational Experiment
4 Project Results
5 Conclusion
References
Evaluation of the Angara Interconnect Prototype TCP/IP Software Stack: Implementation, Basic Tests and BeeGFS Benchmarks
1 Introduction
2 The Angara TCP/IP Implementation
2.1 The Angara Interconnect Architecture
2.2 Ethernet Network Device Driver Implementation
3 Experimental Evaluation
3.1 Hardware and Software Setup
3.2 Benchmarks
3.3 Performance Results
4 Conclusion
References
Fast Parallel Bellman-Ford-Moore Algorithm Implementation for Small Graphs
1 Introduction
2 Current Approaches to SSSP in Parallel
2.1 Dijkstra's SSSP Algorithm
2.2 Bellman-Ford Algorithm
2.3 Bellman-Ford-Moore Algorithm
2.4 Performance Comparison
3 Our Implementation of Bellman-Ford-Moore Algorithm
3.1 The Baseline for the Performance Comparison
3.2 Results
4 Discussion
5 Conclusion
References
Full-Scale Simulation of the Super C-Tau Factory Computing Infrastructure to Determine the Characteristics of the Necessary Hardware
1 Introduction
2 Full-Scale Simulation of the Super C-Tau Factory Computing Infrastructure
2.1 General Scheme of the HPC System Model
2.2 Preparing for Modeling and Setting Up the Model
2.3 Determination of Computing Hardware Parameters
2.4 Determination of Data Storage System Parameters
3 Conclusion
References
Overhead Analysis for Performance Monitoring Counters Multiplexing
1 Introduction
2 Related Work
3 Implementing Different Multiplexing Variants
3.1 Using PAPI with Automatic Multiplexing
3.2 Using PAPI and LIKWID with Manual Multiplexing
3.3 Comments on LIKWID Usage
4 Evaluating Overheads from Multiplexing
4.1 Experimental Conditions
4.2 PAPI with Automatic Multiplexing
4.3 PAPI with Manual Multiplexing
4.4 LIKWID Multiplexing
4.5 Comparing Different Multiplexing Variants
5 Conclusions
References
Regularization Approach for Accelerating Neural Architecture Search
1 Introduction
2 Neural Architecture Search Problems
3 Techniques for Tuning Hyperparameters
4 Usage of Regularization in HPO-Tuning
5 Automatic System Architecture
6 Experiments
6.1 An Approach to Comparing Neuroevolution Methods
6.2 Experimental Setup and Configuration
6.3 Experimental Results
7 Conclusion
References
RICSR: A Modified CSR Format for Storing Sparse Matrices
1 Introduction
2 Row Incremental CSR Format
2.1 CSR Format
2.2 RICSR Format
3 Theoretical Estimates
4 Testing Methodology
4.1 Hardware
4.2 Software
4.3 Test Matrices and Testing Scenario
5 Performance Evaluation Results
5.1 Comparison with MKL
5.2 SpMV Performance for CSR and RICSR
5.3 BiCGStab Solver
5.4 BiCGStab Solver with Algebraic Multigrid Preconditioner
6 Conclusions
References
Root Causing MPI Workloads Imbalance Issues via Scalable MPI Critical Path Analysis
1 Introduction
2 Prior State of the Art
3 Hardware Sampling Based Hotspots
4 Root Causing MPI Imbalance Issues
5 Finding Critical Path in Program Activity Graph
5.1 Data Collection
5.2 MPI Communicators Reconstruction
5.3 Exchange Timings Information Between P2P Senders and Receivers
5.4 Graph Edges Creation
5.5 Finding the Critical Path
6 PMU Samples Aggregation on the Critical Path
6.1 Hotspots on Critical Path – Examples
7 Performance Evaluation
8 Conclusion
References
Rust Language for GPU Programming
1 Introduction
2 Related Work
3 Becnhmarking
3.1 Benchmark Characteristics
3.2 Naive Matrix Multiplication
3.3 CUDA Tiled Matrix Multiplication
3.4 Implementation of 2D Shared Memory Syntax
4 Conclusion
References
Study of Scheduling Approaches for Batch Processing in Big Data Cluster
1 Introduction
2 Related Works
3 Single Task and Multi-task Scheduling
3.1 Data Locality in HDFS and Framework's Design
3.2 Graph Block Scheduler
3.3 Data-Driven Heuristics Approach: Cores Scheduler
3.4 Other Scheduling Strategies
4 Scheduling Measurements
4.1 Review of Existing Approaches
4.2 Scheduling Metrics: Diagrams, Resources, Efficiency
4.3 Experiments Conduction in Real Environment
5 Summary of the Results
5.1 Performance and Runtime Comparisson
5.2 Metrics Comparison and Improvements
6 Conclusion
References
System for Collecting Statistics on Power Consumption of Supercomputer Applications
1 Introduction
2 Software-Based Measuring of Commodity Microprocessors Power Consumption
3 Methods of Power Consumption Measuring for Graphics Microprocessors
4 Software Stack for Collecting Data on Computers Power Consumption
5 Experimental Study of NPB Tests Power Consumption
6 Conclusion
References
Teaching Advanced AI Development Techniques with a New Master's Program in Artificial Intelligence Engineering
1 Introduction
2 Background
2.1 A Demand for AI Engineers
2.2 Related Work
3 The Master's Program Description
3.1 Mandatory Courses
3.2 Elective Courses
3.3 Project Development Workshop
4 Cooperation with IT Industry
5 Partnership with the Universities
6 Conclusion
References
Teragraph Heterogeneous System for Ultra-large Graph Processing
1 Introduction
2 Challenges of Graph Processing and Related Works
3 Discrete Mathematics Instruction Set
4 Leonhard x64 Microarchitecture
5 Teragraph Architecture
6 Teragraph Programming Techniques
7 Efficiency Tests and Comparison
8 Teragraph Applications
9 Conclusions and Future Works
References
Towards OpenUCX and GPUDirect Technology Support for the Angara Interconnect
1 Introduction
2 Background
2.1 Angara Interconnect API
2.2 OpenMPI and OpenUCX Libraries
2.3 Conventional MPI Point-to-Point Communication Protocols
2.4 GPUDirect Technology
3 UCX-Angara
3.1 Connection Management
3.2 Eager, SAR and LMT Protocols Support
3.3 Designing Angara Interconnect Memory Registration API
4 Experimental Results
4.1 Hardware and Software Testbed
4.2 Experiment Policy
4.3 Performance of MPI P2P Operations with Host and GPU Memory
4.4 Performance of MPI Collectives with Host and GPU Memory
5 Related Work
6 Conclusion and Future Work
References
Wiki Representation and Analysis of Knowledge About Algorithms
1 Introduction
2 Existing Approaches to the Mapping of Problems to Computer Architecture
3 Description of Algorithm Properties and Structure
4 Wiki Encyclopedia
5 Hierarchical Representation of Knowledge About Algorithms
6 Algo500 Project
7 Conclusions
References
Distributed and Cloud Computing
BOINC-Based Volunteer Computing Projects: Dynamics and Statistics
1 Introduction
2 Citizen Science and Volunteer Computing
3 Volunteer Computing Projects
4 Volunteers and Performance
5 Volunteer Computing in Russia
6 Conclusion and Discussion
References
Desktop Grid as a Service Concept
1 Desktop Grid and Cloud Computing
2 Desktop Grid and Cloud Computing Hybridization Approaches Review
3 Desktop Grid as a Service
3.1 Desktop Grid as a Service Concept
3.2 Architecture
3.3 Desktop Grid as a Service Stakeholders
4 Conclusion
References
Distributed Computing for Gene Network Expansion in R Environment
1 Introduction
2 Distributed Computing for Gene Network Expansion
2.1 Gene Expansion Problem
2.2 Gene@home
2.3 Expansion Lists Post-processing
3 BOINC in R Environment
3.1 BOINC
3.2 RBOINC Software
3.3 PCALG in RBOINC: A Case Study
4 Conclusion
References
Distributed Simulation of Supercomputer Model with Heavy Tails
1 Introduction
2 Discrete-Event Simulation
2.1 Heavy-Tailed Distribution Sampling
3 Supercomputer Model
4 Numerical Experiments
4.1 PASTA Inequality
4.2 Estimator Efficiency
5 Conclusion
References
Ensuring Data Integrity Using Merkle Trees in Desktop Grid Systems
1 Introduction
2 Review
3 Mechanisms for Ensuring Fault Tolerance and Data Integrity
4 Methods
4.1 Data Integrity Threat Model
4.2 Description of the Data Integrity Approach
5 Results
5.1 Testing the Data Integrity Method
5.2 Testing the Computing Performance Using the Proposed Method
6 Conclusion
References
Optimization of the Workflow in a BOINC-Based Desktop Grid for Virtual Drug Screening
1 Introduction
2 Volunteer Computing Project SiDock@home
2.1 Project Setup
2.2 High-Throughput Virtual Screening
3 Analysis and Optimization of the Computational Process
3.1 Analysis of a Conventional Computational Process
3.2 Known Drawbacks of the Conventional Computational Process
3.3 Optimization of the Computational Process
4 Implementation and Results
5 Conclusion
References
Correction to: Fast Parallel Bellman-Ford-Moore Algorithm Implementation for Small Graphs
Correction to: Chapter “Fast Parallel Bellman-Ford-Moore Algorithm Implementation for Small Graphs” in: V. Voevodin et al. (Eds.): Supercomputing, LNCS 13708, https://doi.org/10.1007/978-3-031-22941-1_32
Author Index
📜 SIMILAR VOLUMES
<p><span>The two-volume set LNCS 14388 and 14389 constitutes the refereed proceedings of the 9th Russian Supercomputing Days International Conference (RuSCDays 2023) held in Moscow, Russia, during September 25-26, 2023.</span></p><p><span>The 44 full papers and 1 short paper presented in these proce
<span>This book constitutes the refereed post-conference proceedings of the 7th Russian Supercomputing Days, RuSCDays 2021, held in Moscow, Russia, in September 2021.</span><p><span>The 37 revised full papers and 3 short papers presented were carefully reviewed and selected from 99 submissions. The
<p><p>This book constitutes the refereed post-conference proceedings of the 6th Russian Supercomputing Days, RuSCDays 2020, held in Moscow, Russia, in September 2020.*</p><p>The 51 revised full and 4 revised short papers presented were carefully reviewed and selected from 106 submissions. The papers
<p>This book constitutes the refereed proceedings of the Second Russian Supercomputing Days, RuSCDays 2016, held in Moscow, Russia, in September 2016. <p>The 28 revised full papers presented were carefully reviewed and selected from 94 submissions. The papers are organized in topical sections on the
<p>This book constitutes the refereed proceedings of the Third Russian Supercomputing Days, RuSCDays 2017, held in Moscow, Russia, in September 2017.<p>The 41 revised full papers and one revised short paper presented were carefully reviewed and selected from 120 submissions. The papers are organized