Parallel and Distributed Computing, Applications and Technologies: 23rd International Conference, PDCAT 2022, Sendai, Japan, December 7–9, 2022, Proceedings

✍ Scribed by Hiroyuki Takizawa, Hong Shen, Toshihiro Hanawa, Jong Hyuk Park, Hui Tian, Ryusuke Egawa

Publisher: Springer
Year: 2023
Tongue: English
Leaves: 526
Series: Lecture Notes in Computer Science, 13798
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

This book constitutes the proceedings of the 23rd International Conference on Parallel and Distributed Computing, Applications, and Technologies, PDCAT 2022, which took place in Sendai, Japan, during December 7-9, 2022.

The 24 full papers and 16 short papers included in this volume were carefully reviewed and selected from 95 submissions. The papers are categorized into the following topical sub-headings: Heterogeneous System (1; HPC & AI; Embedded systems & Communication; Blockchain; Deep Learning; Quantum Computing & Programming Language; Best Papers; Heterogeneous System (2); Equivalence Checking & Model checking; Interconnect; Optimization (1); Optimization (2); Privacy; and Workflow.

✦ Table of Contents

Preface
Organization
Contents
Heterogeneous System (1)
Towards Priority-Flexible Task Mapping for Heterogeneous Multi-core NUMA Systems
1 Introduction
2 Motivation and Objective
3 Task Mapping with Priority Option Switching
3.1 Overview
3.2 Potential Benefit Metrics and POSM
3.3 HPO and MPO Mapping Algorithms
4 Evaluation
4.1 Experimental Environment
4.2 Evaluation of HPO and MPO
4.3 Evaluation of POSM
5 Conclusions
References
Multi-GPU Scaling of a Conservative Weakly Compressible Solver for Large-Scale Two-Phase Flow Simulation
1 Introduction
2 Numerical Methods
2.1 Con-CAC-LS
2.2 Con-PLIC-HF
2.3 Evolving Pressure Projection Method
3 Multi-GPU Computation
3.1 Performance of Con-CAC-LS
3.2 Performance of Con-PLIC-HF
4 Numerical Results
4.1 Rayleigh-Taylor Instability
4.2 Drop Impacting on a Thin Liquid Film
4.3 Liquid Jet in Gas Cross-Flow
5 Conclusion
References
Improving the Performance of Lattice Boltzmann Method with Pipelined Algorithm on A Heterogeneous Multi-zone Processor
1 Introduction
2 MT-3000
2.1 Programming Environment
3 Methodology
3.1 Lattice Boltzmann Method
3.2 Data Storage Schemes
3.3 An Improved Pipelined Algorithm
3.4 Multi-level Parallelization Strategy
4 Numerical and Performance Results
5 Conclusions
References
FPGA
DEEPFAKE CLI: Accelerated Deepfake Detection Using FPGAs
1 Introduction
2 Motivation and Background
3 Implementation
3.1 Software Implementation
3.2 Hardware Implementation
4 Results
4.1 Software Results
4.2 Hardware Results
4.3 Benchmarking Deepfake CLI with Other Nodes
5 Conclusion
6 Future Work
References
Memory Access Optimization for Former Process of Pencil Drawing Style Image Conversion in High-Level Synthesis
1 Introduction
2 Pencil Drawing Style Conversion Process
3 Proposal Method
3.1 Reconstruction of Edge Strength Image Input/output
3.2 Reconstruction of Memory Access Unit
4 Experiments and Discussions
4.1 Software Execution Time
4.2 Simplified Estimate of the Number of Cycles After High-Level Synthesis
4.3 Performance Estimation in Logic Circuit Simulation
4.4 Performance Measurement on Actual Equipment
5 Conclusion
References
Word2Vec FPGA Accelerator Based on Spatial and Temporal Parallelism
1 Introduction
2 Previous Work
2.1 Word2vec Algorithm
2.2 Acceleration of Word2vec
2.3 Data-Flow of Word2vec
3 Systolic Array Architecture for Word2vec
4 Evaluation
5 Conclusion
References
HPC and AI
Analyzing I/O Performance of a Hierarchical HPC Storage System for Distributed Deep Learning
1 Introduction
2 Background
2.1 File Access in Distributed Neural Network Workloads
2.2 Storage System in HPC
3 Related Work
4 Methodology
4.1 Overview
4.2 Measuring I/O Performance by Benchmark
4.3 Analyzing the I/O Performance
4.4 Estimate Performance by Storage Improvement
5 Experiment Results
5.1 Setup for Experiment
5.2 Measuring Execution Time for Epochs
5.3 Analyzing I/O Performance
5.4 Estimating the Impact of the Storage Improvements
6 Discussion
7 Conclusion
References
An Advantage Actor-Critic Deep Reinforcement Learning Method for Power Management in HPC Systems
1 Introduction
2 Power State Management Problem
3 A2C Deep Reinforcement Learning for Power State Management Problem
3.1 MDP Formulation of PSMP
3.2 A2C-DRL for PSMP
4 Evaluation
4.1 Experimental Setup
4.2 Results and Discussion
5 Conclusion
References
An AutoML Based Algorithm for Performance Prediction in HPC Systems
1 Introduction
2 Algorithm
3 SPEC Benchmark Applications Performance Dataset
4 Experiments and Results
4.1 Evaluation of Algorithm Effectiveness
4.2 Accuracy Comparison with the State-of-the-Art
5 Conclusions and Future Work
References
Embedded Systems and Communication
Edge-Gateway Intrusion Detection for Smart Home
1 Introduction
2 Related Work
3 Edge-Gateway Intrusion Detection System for Smart Home
3.1 System Framework
3.2 Edge Intrusion Detection of Smart Device Based on Gaussian Distribution
3.3 Gateway-Level Centralized Intrusion Detection Based on RF-GraphSAGE
4 Results and Discussion
4.1 Experimental Setup
4.2 Results of Device-Level Edge Intrusion Detection
4.3 Results of Gateway-Level Centralized Intrusion Detection
5 Conclusion
References
Energy-Delay Tradeoff in Parallel Task Allocation and Execution for Autonomous Platooning Applications
1 Introduction
2 Divisible Load Scheduling Model
3 Markov Chain Model
4 Energy Consumption Model
4.1 Energy Consumption Model for Task Execution
4.2 Energy Consumption Model for Load Transmission
5 Energy Consumption-Delay Tradeoff Task Allocation and Execution Algorithm
6 Simulation and Analysis
7 Conclusion
References
A Reservation-Based List Scheduling for Embedded Systems with Memory Constraints
1 Introduction
2 Related Work
2.1 Scheduling with Shared Memory Constraints
2.2 Analysis
3 Challenges: Scheduling with Memory Constraints
3.1 Memory Constraints
3.2 Motivation Example
4 Problem Modeling
4.1 Memory Model
4.2 List Scheduling Definition
5 Proposed Solution
6 Experimental Results
6.1 Random Workflows Generator
6.2 Impact of Depth k
6.3 Minimum Memory Usage
6.4 Minimized Makespan Value Under Memory Constraints
6.5 Applicability
7 Conclusions
References
Formalization and Verification of SIP Using CSP
1 Introduction
2 Background
2.1 SIP
2.2 SDN
3 Modeling SIP
3.1 Sets, Messages and Channels
3.2 Overall Modeling
3.3 Register
3.4 Client
3.5 Intruder
4 Verification
4.1 Verification in PAT
4.2 Results
5 Improvement
5.1 Modeling SIPS
5.2 Verification
6 Conclusion and Future Work
References
Blockchain
Towards a Blockchain and Fog-Based Proactive Data Distribution Framework for ICN
1 Introduction
2 Related Work
3 Proposed Framework
3.1 Framework Overview
3.2 Framework Data Flow
3.3 Security-Related Advantages
3.4 Block Structure
4 Performance Results
4.1 Lesson Learned
5 Conclusion and Future Work
References
Research on User Influence Weighted Scoring Algorithm Incorporating Incentive Mechanism
1 Introduction
2 User Influence Weighting Algorithm
2.1 User Influence Model
2.2 Weighted Rating Algorithm
3 Rating Incentive Mechanism
4 Experiments and Analysis
4.1 Experimental Environment
4.2 Experimental Protocols
4.3 Analysis of Experimental Results
4.4 Safety Analysis
5 Conclusion
References
BloodMan-Chain: A Management of Blood and Its Products Transportation Based on Blockchain Approach
1 Introduction
2 Related Work
2.1 Blood Supply Chain Management Systems Not Based on Blockchain Technology
2.2 Blood Supply Chain Management Systems Based on Blockchain Technology
3 BloodMan-Chain Architecture
3.1 Overview Model
3.2 Detailed Model
4 Evaluation Scenarios
4.1 Environment Setting
4.2 Results
5 Conclusion
References
Deep Learning
A Systematic Comparison on Prevailing Intrusion Detection Models
1 Introduction
2 Related Work
3 Methodology
4 Experimental Study
5 Conclusions
References
Enhancing Resolution of Inferring Hi-C Data Integrating U-Net and ResNet Networks
1 Introduction
2 Method
2.1 Data Preprocessing
2.2 Structure of Network Model
2.3 Algorithm
3 Experiment
3.1 Experiment of Training Network Model
3.2 Experimental Results on Human Cell Datasets
4 Conclusion
References
Detecting Network Intrusions with Resilient Approaches Based on Convolutional Neural Networks
1 Introduction
2 Deep Learning Applications in Network Intrusion Detection
3 Application of CNN-LSTM on NSL-KDD Dataset
3.1 NSL-KDD Dataset Overview
3.2 Proposed CNN-LSTM Model
4 Experiment and Results
4.1 CNN Model Experiment
4.2 CNN-LSTM Model Experiment
4.3 Models’ Results Comparison
5 Conclusion
References
Quantum Computing and Programming Language
Analysis of Precision Vectors for Ising-Based Linear Regression
1 Introduction
2 Ising-Based Linear Regression
2.1 Linear Regression
2.2 A Precision Vector
2.3 A Precision Vector of Ising-Based Linear Regression
3 Analysis of a Precision Vector for Efficient Ising-Based Linear Regression
4 Evaluation
4.1 Experimental Environments
4.2 Results and Discussions
5 Related Work
6 Conclusions
References
Evaluating and Analyzing Irregular Tree Search in the Tascell and HOPE Parallel Programming Languages
1 Introduction
2 Irregular Tree Search
3 Conventional Language
4 Tascell and HOPE Languages
5 Evaluations
6 Conclusion
References
Best Papers
Distributed Parallel Tall-Skinny QR Factorization: Performance Evaluation of Various Algorithms on Various Systems
1 Introduction
2 Problem Setting and Overview of Algorithms
2.1 Problem Setting
2.2 Overview of Algorithms
3 Summary of the Target Algorithms
3.1 HQR: Householder QR
3.2 CGS2: Classical Gram-Schmidt with Reorthogonalization
3.3 TSQR
3.4 S-CholQR3: Shifted CholekyQR3
4 Performance Evaluation
4.1 Overview of the Implementation of the Algorithms
4.2 Computational Environments and Evaluation Settings
4.3 Results
5 Related Work
6 Conclusion
References
A Partitioned Memory Architecture with Prefetching for Efficient Video Encoders
1 Introduction
2 Memory Subsystems for Video Encoding
2.1 Video Encoding
2.2 Hardware Encoding Pipeline
2.3 Analysis of Memory Access Pattern
3 A Partitioned Memory System for CTU-Pipelined Video Encoders
3.1 Split Cache Structure
3.2 Coding Tree Unit Prefetcher
4 Evaluations and Discussions
4.1 Experimental Methodology
4.2 Evaluation Results
5 Related Work
6 Conclusions
References
A Hardware Trojan Exploiting Coherence Protocol on NoCs
1 Introduction
2 Related Work
2.1 Hardware Trojan Embedded in Network Interfaces
2.2 Hardware Trojan Colluding with Malware
2.3 Attack by Hardware Trojan Using Messages Controlling Data Coherence
2.4 Packet Tampering Attack
3 New Security Risk
3.1 Prerequisite
3.2 New Eavesdropping Attack
3.3 Denial-of-Service Attack
4 Countermeasure
4.1 Detection Method
4.2 Countermeasure
5 Evaluation
5.1 Execution Time and Traffic Amount
5.2 Energy Consumption
5.3 Hardware Amount
6 Conclusions
References
A System-Wide Communication to Couple Multiple MPI Programs for Heterogeneous Computing
1 Introduction
2 Heterogeneous Coupling Computing as a Next Computing Environment
3 Heterogeneous Coupling Communication Requirements
4 WaitIO-Socket Design
5 WaitIO-Socket Implemanetation
5.1 WaitIO Overview and Application Program Interface
5.2 Implementation Overview
6 WaitIO-Socket Evaluation
6.1 Evaluation Environment
6.2 Multiple Stream Performance
6.3 Wisteria System Performance
6.4 Application Performance
7 Related Work
8 Summary
References
Heterogeneous System (2)
A Task-Parallel Runtime for Heterogeneous Multi-node Vector Systems
1 Introduction
2 Related Work
2.1 Task-Parallel Runtime System
2.2 SX-Aurora TSUBASA
3 A Task-Parallel Runtime for SX-AT
3.1 An Overview of the Proposed Runtime
3.2 Hybrid MPI for SX-Aurora TSUBASA
4 Evaluation and Discussions
4.1 Evaluation Setup
4.2 Performance Gain by Proposed Runtime
5 Conclusion
References
Accelerating Radiative Transfer Simulation on NVIDIA GPUs with OpenACC
1 Introduction
2 ARGOT: Radiative Transfer Simulation Code for Astrophysics
2.1 ARGOT Method
2.2 ART Method
3 GPU Implementation with OpenACC
3.1 OpenACC
3.2 OpenACC Implementation of the ARGOT Code
4 OpenACC Implementation Using Multiple GPUs
4.1 Overview
4.2 Node Parallelization for ARGOT Code
5 Evaluation
5.1 Experimental Settings
5.2 Performance Comparison Between CUDA and OpenACC Implementations
6 Related Work
7 Conclusion
References
QR Factorization of Block Low-Rank Matrices on Multi-instance GPU
1 Introduction
2 Related Work
3 Numerical Calculations on MIG
3.1 Small Numerical Calculations on GPU
3.2 Small Numerical Calculations on MIG
4 BLR-QR on GPU and MIG
4.1 BLR-QR
4.2 BLR-QR on GPU
4.3 BLR-QR on MIG
5 Performance Evaluation
5.1 Execution Environment and Target Matrices
5.2 Performance Evaluation
6 Conclusions and Future Work
References
Equivalence Checking and Model Checking
Equivalence Checking of Code Transformation by Numerical and Symbolic Approaches
1 Introduction
2 Related Work
2.1 Xevolver for C
2.2 CIVL
3 Equivalence Checking Method
3.1 Extension of Xev-C Notation for Symbolic Execution
3.2 Checking by Numerical Comparison
4 Evaluation
4.1 Evaluation Setup
4.2 Evaluation Results
5 Concluding Remarks and Future Work
References
MEA: A Framework for Model Checking of Mutual Exclusion Algorithms Focusing on Atomicity
1 Introduction
2 Background
2.1 The Mutual Exclusion Algorithms
2.2 Maude
3 The Methodology
3.1 The Snapshot and Transition Rules
3.2 Transition Rules
3.3 Verification
4 Case Study
4.1 High-Level Abstraction
4.2 Refinement
5 Conclusion
References
Interconnect
A High-Radix Circulant Network Topology for Efficient Collective Communication
1 Introduction
2 Background
2.1 Circulant Network Topology
2.2 Target Collective Communication
3 Using Circulant Network Topology
3.1 Definition of Circulant
3.2 Collective-Communication Operations on Circulant Network Topology
4 Evaluation
4.1 Methodology
4.2 Diameter and ASPL
4.3 Hop Counts of Collective-Communication Operations
4.4 Execution Time of Collective-Communication Operations
5 Discussions
6 Conclusions
References
Fault Tolerance and Packet Latency of Peer Fat-Trees
1 Introduction
2 Related Fat-Tree Networks
2.1 k-ary n-fly Butterfly Network
2.2 k-ary n-tree Clos Network
2.3 k-ary n-tree Fat-Tree Network
2.4 Bidirectional k-ary n-tree Clos Network
2.5 k-ary Fat-Tree Network
2.6 Mirrored k-ary n-tree Network
3 Peer k-ary n-tree Network
4 Routing in Peer k-ary n-tree Network
5 Fault Tolerance of Peer k-ary n-tree Network
6 Packet Latency of Peer k-ary n-tree Network
7 Conclusions
References
Accelerating Imbalanced Many-to-Many Communication with Systematic Delay Insertion
1 Introduction
2 Related Work
3 Preliminaries
3.1 Communication Structure
3.2 Communication Schedule
4 Proposed Method
4.1 Arrival-Consumption Model
4.2 Model Parameter Estimation
4.3 Scheduling Algorithm
5 Evaluation
6 Conclusion
References
Optimization (1)
Optimizing Depthwise Convolutions on ARMv8 Architecture
1 Introduction
2 Analysis of Existing Implementations
3 Our Approach
3.1 Implementation
3.2 Arithmetic Intensity
4 Experimental Evaluation
4.1 Experimental Setup
4.2 Algorithm Performance
4.3 Full Topology Performance
5 Conclusion
References
A Profiling-Based Approach to Cache Partitioning of Program Data
1 Introduction
2 Background
2.1 Reuse Distance
2.2 Cache Partitioning
3 Implementation and Experimental Methodology
3.1 Profiler Components
3.2 Cache Partitioning Policy
3.3 Limitations
4 Experimental Results
4.1 Experimental Setup
4.2 Workload
4.3 Dense Matrix Transposed Vector Multiplication
4.4 NAS Parallel Benchmarks
5 Related Work
6 Discussion and Conclusions
References
Optimization (2)
Memory Bandwidth Conservation for SpMV Kernels Through Adaptive Lossy Data Compression
1 Introduction
2 Background
2.1 SpMV Kernel Overview
2.2 Performance Characteristics in SpMV Kernel
2.3 Related Work
3 In-Memory Data Compression Architecture
3.1 Data Compression Memory Interface
3.2 Data Compression Scheme
4 Evaluation
4.1 Methodology
4.2 Empirical Compression Ratio
4.3 Impact on Convergence Conditions by Lower Precision Data
4.4 Cycle-Accurate Performance Evaluation
5 Conclusions
References
SimdFSM: An Adaptive Vectorization of Finite State Machines for Speculative Execution
1 Introduction
2 Background
2.1 Speculative Execution — spec(K)
2.2 Vector Extension and Processor Microarchitecture
2.3 FSM Size — The Number of States |Q|
3 Vectorized FSMs
3.1 Gather
3.2 Shuffle
3.3 Permute
3.4 Interleaved-Gather (iGather)
4 The Adaptive Strategy of SimdFSM
5 Evaluation
5.1 Performance Profiling of Speculating Phase
5.2 Evaluation of the Adaptive Vectorization Strategy
6 Related Work
7 Conclusion
References
Privacy
Broad Learning Inference Based on Fully Homomorphic Encryption
1 Introduction
2 Related Work
2.1 Homomorphic Encryption
2.2 Leveled Homomorphic Encryption Inference
2.3 Privacy Preserving Broad Learning
3 Our Method
3.1 LHE-Based Broad Learning System
3.2 HE-Based Broad Learning System with Incremental Learning
3.3 Polynomial Approximation of Activation Function
4 Experimental Evaluation
4.1 Datasets
4.2 Network Architecture
4.3 Evaluation Method
4.4 Evaluation Results
5 Conclusions
References
Application of Probabilistic Common Set on an Open World Set for Vertical Federated Learning
1 Introduction
2 Background
2.1 Collaboration Between Different Industries and Privacy Care
2.2 Federated Learning
3 Proposed Method
3.1 Basic Idea
3.2 Procedure
4 Evaluation
4.1 Evaluation Environment
4.2 Results and Discussion
5 Conclusion
References
Workflow
Towards a Standard Process Management Infrastructure for Workflows Using Python
1 Introduction
2 Background
3 Approach
3.1 PMIx Python Bindings
3.2 Pyrun Prototype
4 Evaluation/Demonstration
4.1 Single Core Task Injection
4.2 Comparison with Command Line Tools
4.3 Multithreaded Launch Capability
4.4 Heterogeneous Parallel MPI Tasks
5 Related Work
6 Conclusion
References
Author Index

📜 SIMILAR VOLUMES

Parallel and Distributed Computing, Appl

📁 Parallel and Distributed Computing, Applications and Technologies: 23rd International Conference, PDCAT 2022, Sendai, Japan, December 7–9, 2022, Proceedings

✍ Hiroyuki Takizawa; Hong Shen; Toshihiro Hanawa; Jong Hyuk Park; Hui Tian; Ryusuk 📂 Library 📅 2023 🏛 Springer Nature 🌐 English

Parallel and Distributed Computing, Appl

📁 Parallel and Distributed Computing, Applications and Technologies. Proceedings of PDCAT 2023

✍ Ji Su Park, Hiroyuki Takizawa, Hong Shen, James J. Park 📂 Library 📅 2024 🏛 Springer 🌐 English

Parallel and Distributed Computing, Appl

📁 Parallel and Distributed Computing, Applications and Technologies: 22nd International Conference, PDCAT 2021, Guangzhou, China, December 17–19, 2021, ... Computer Science and General Issues)

✍ Hong Shen (editor), Yingpeng Sang (editor), Yong Zhang (editor), Nong Xiao (edit 📂 Library 📅 2022 🏛 Springer 🌐 English

This book constitutes the proceedings of the 22nd International Conference on Parallel and Distributed Computing, Applications, and Technologies, PDCAT 2021, which took place in Guangzhou, China, during December 17-19, 2021.The 24 full papers and 34 short papers included

Parallel and Distributed Computing: Appl

📁 Parallel and Distributed Computing: Applications and Technologies: 5th International Conference, PDCAT 2004, Singapore, December 8-10, 2004. Proceedings

✍ Ching-Lian Chua, Francis Tang, Yun-Ping Lim, Liang-Yoong Ho, Arun Krishnan (auth 📂 Library 📅 2005 🏛 Springer-Verlag Berlin Heidelberg 🌐 English

The 2004 International Conference on Parallel and Distributed Computing, - plications and Technologies (PDCAT 2004) was the ?fth annual conference, and was held at the Marina Mandarin Hotel, Singapore on December 8–10, 2004. Since the inaugural PDCAT held in Hong Kong in 2000, the conference has

Parallel and Distributed Computing: Appl

📁 Parallel and Distributed Computing: Applications and Technologies: 5th International Conference, PDCAT 2004, Singapore, December 8-10, 2004. Proceedings

✍ Ching-Lian Chua, Francis Tang, Yun-Ping Lim, Liang-Yoong Ho, Arun Krishnan (auth 📂 Library 📅 2005 🏛 Springer-Verlag Berlin Heidelberg 🌐 English

Parallel and Distributed Computing, Appl

📁 Parallel and Distributed Computing, Applications and Technologies: 21st International Conference, PDCAT 2020, Shenzhen, China, December 28–30, 2020, ... (Lecture Notes in Computer Science, 12606)

✍ Yong Zhang (editor), Yicheng Xu (editor), Hui Tian (editor) 📂 Library 📅 2021 🏛 Springer 🌐 English

This book constitutes the proceedings of the 21st International Conference on Parallel and Distributed Computing, Applications, and Technologies, PDCAT 2020, which took place in Shenzhen, China, during December 28-30, 2020.The 34 full papers