𝔖 Scriptorium
✦   LIBER   ✦

📁

Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation (Communications in Computer and Information Science)

✍ Scribed by Jeffrey Nichols (editor), Arthur ‘Barney’ Maccabe (editor), James Nutaro (editor), Swaroop Pophale (editor), Pravallika Devineni (editor), Theresa Ahearn (editor), Becky Verastegui (editor)


Publisher
Springer
Year
2022
Tongue
English
Leaves
474
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


This book constitutes the revised selected papers of the 21st Smoky Mountains Computational Sciences and Engineering Conference, SMC 2021, held in Oak Ridge, TN, USA, in October 2021.

The 33 full papers and 3 short papers presented were carefully reviewed and selected from a total of 88 submissions. The papers are organized in topical sections of computational applications: converged HPC and artificial intelligence; advanced computing applications: use cases that combine multiple aspects of data and modeling; advanced computing systems and software: connecting instruments from edge to supercomputers; deploying advanced computing platforms: on the road to a converged ecosystem; scientific data challenges.

The conference was held virtually due to the COVID-19 pandemic.

✦ Table of Contents


Preface
Organization
Contents
Computational Applications: Converged HPC and Artificial Intelligence
Randomized Multilevel Monte Carlo for Embarrassingly Parallel Inference
1 Introduction
1.1 The Sweet and the Bitter of Bayes
2 Technical Details of the Methodology
2.1 Multilevel Monte Carlo
2.2 Randomized Multilevel Monte Carlo
2.3 Multi-index Monte Carlo
3 Motivating Example
3.1 Example of Problem
3.2 Numerical Results
4 Conclusion and Path Forward
References
Maintaining Trust in Reduction: Preserving the Accuracy of Quantities of Interest for Lossy Compression
1 Challenges in Lossy Compression for Physics Simulations
2 Background of Error-Controlled Lossy Compression
3 Error-Controlled Lossy Compression in High-Dimensional Space
4 Error-Controlled Lossy Compression on Nonuniform Grids
5 Error-Controlled Lossy Compression for QoIs
6 Future Work
7 Conclusion
References
Applying Recent Machine Learning Approaches to Accelerate the Algebraic Multigrid Method for Fluid Simulations
1 Introduction
2 Background
2.1 Overview of Algebraic Multigrid
2.2 A Dataset of Sparse Systems from 3D Unstructured Meshes
3 Overview of the Recently Proposed Methods
3.1 Deep Residual Feed-Forward Network for 2D Structured Grid Problems (Greenfeld et al. ch3Greenfeld2019)
3.2 Graph Neural Networks for Unstructured Problems (Luz et al. ch3Luz2020)
4 Results
4.1 For the Model in Greenfeld et al.
4.2 For the Model in Luz et al.
5 Conclusion
A Dataset
References
Building an Integrated Ecosystem of Computational and Observational Facilities to Accelerate Scientific Discovery
1 Introduction
2 Background and Context: Connecting Science Facilities
3 Overview of the ORNL Federated Science Edge Ecosystem
4 Resources
4.1 Instrument Interfaces
4.2 Compute
4.3 Storage
4.4 Network
5 Federation Services and Policies
5.1 Resource Management
5.2 Command and Control
5.3 Workflows
5.4 Dashboard
5.5 Data Movement
5.6 Data Management
5.7 Identity Management
5.8 Policy and Governance
6 Use Case: Autonomous Microscopy at CNMS
6.1 Scanning Probe Microscopy
6.2 Electron Microscopy
7 Conclusions
References
Advanced Computing Applications: Use Cases that Combine Multiple Aspects of Data and Modeling
Fast and Accurate Predictions of Total Energy for Solid Solution Alloys with Graph Convolutional Neural Networks
1 Introduction
2 Physical System - Solid Solution Binary Alloys
3 Graph Convolutional Neural Networks (GCNNs)
3.1 Software Implementation
4 Dataset Description
5 Use of Federated Instruments, Compute, and Storage
6 Numerical Results
6.1 Comparison Between Computational Times for First Principles Calculations and DL Models
6.2 Comparison Between Statistical Models for Predictive Performance
7 Conclusions
References
Transitioning from File-Based HPC Workflows to Streaming Data Pipelines with openPMD and ADIOS2
1 The Need for Loosely-Coupled Data Pipelines
1.1 The IO Bottleneck – A Challenge for Large-Scale IO
1.2 From Monolithic Frameworks to Loosely-Coupled Pipelines
1.3 Related Work
2 Building a System for Streaming IO
2.1 Loosely Coupled Data Processing Pipelines Built via Streaming
2.2 Impact in an Increasingly Heterogeneous Compute Landscape
2.3 OpenPMD and ADIOS2: Scientific Self-description and Streaming
3 Data Distribution Patterns
3.1 Properties Found in a Performant Distribution Pattern
3.2 Chunk Distribution Algorithms
4 Evaluation for Two Streaming Setups
4.1 Streaming as Basis for an Asynchronous IO Workflow
4.2 A Staged Simulation-Analysis Pipeline: Setup
4.3 A Staged Simulation-Analysis Pipeline: Evaluation
5 Summary and Outlook
References
Understanding and Leveraging the I/O Patterns of Emerging Machine Learning Analytics
1 Introduction
2 Machine Learning Patterns
2.1 Overview
2.2 Optimization Opportunities
2.3 Automation Opportunities
3 Data Management Vision
3.1 Model Metadata
3.2 Add Query Capabilities over the Model Metadata
3.3 Positioning and Impact
4 Conclusions
References
Secure Collaborative Environment for Seamless Sharing of Scientific Knowledge
1 Introduction
1.1 Background and Motivation
1.2 Organization
2 Use Case Scenarios
2.1 Data Quality Assessment from Small-Angle Neutron Scattering (SANS) Instrument
2.2 Data Collection
2.3 ML Models and Privacy
3 Machine Learning Model Training
3.1 Model Setup
3.2 Model Training
3.3 Differential Privacy Cost
4 Secure Inference
4.1 Secure MPC Setup
4.2 Secure MPC Performance
4.3 Secure MPC Cost
5 Conclusion and Future Work
References
The Convergence of HPC, AI and Big Data in Rapid-Response to the COVID-19 Pandemic
1 Background
1.1 Problem Statement
1.2 Collaborative Response
1.3 HPC, AI and Big Data Case Studies
2 Case Study 1: High Performance Computing
2.1 Problem Statement
2.2 Methodology
2.3 Results
3 Case Study 2: Artificial Intelligence
3.1 Problem Statement
3.2 Methodology
3.3 Results
4 Case Study 3: Big Data
4.1 Problem Statement
4.2 Methodology
4.3 Results
5 Lessons Learned
5.1 Scientific Gaps and Benefits
5.2 Technology Gaps and Benefits
6 Summary
References
High-Performance Ptychographic Reconstruction with Federated Facilities
1 Introduction
2 Background
3 Ptychography Workflow with Federated Resources
3.1 Automated Light Source Workflow Execution and Coordination
3.2 Transparent Remote Function Calls and Data Transfers
3.3 Accelerated Ptychographic Image Reconstruction
4 Experimental Results
4.1 Optimum GPU Configuration
4.2 End-to-End Workflow Evaluation
5 Related Work
6 Conclusion
References
Machine-Learning Accelerated Studies of Materials with High Performance and Edge Computing
1 The Scientific Question: Correlated Quantum Materials
2 The Dynamic Cluster Approximation Quantum Monte Carlo Method, DCA
3 ML-Accelerated Simulations and Feedback Loop Between Simulations and Experiments
3.1 Current Practices
3.2 Proof of Principles
4 Challenges in Current Workflow
4.1 An Observation: Two Data Sources and Two Edges
4.2 Programming Languages Inconsistency for Different Tasks in the Workflow
4.3 Lack of Standardized ML Model Specification Format
4.4 Heterogeneity in Hardware and Software Architectures on Different Edge Devices
5 Opportunities and Needs for Development
5.1 A Unified Edge Capable of Serving both HPC and Experimental Data Sources
5.2 Compatibility of HPC and ML Software Stack; Package Management Tools for Both HPC and ML Library Dependencies
5.3 Workflow Tools and Policies on Edge Computers
5.4 Standardized ML Model Specification Format
6 Conclusions and Outlook
References
Advanced Computing Systems and Software: Connecting Instruments from Edge to Supercomputers
A Hardware Co-design Workflow for Scientific Instruments at the Edge
1 Introduction
2 Background
3 Hardware Programming Ecosystem
3.1 Chisel Hardware Construction Language
3.2 Open-Source Hardware Development Ecosystems
4 Co-design Workflow for Hardware Libraries
5 Conclusion
References
NREL Stratus - Enabling Workflows to Fuse Data Streams, Modeling, Simulation, and Machine Learning
1 Introduction
2 Advanced Computing at the National Renewable Energy Laboratory
2.1 ESIF High Performance Computing Data Center
2.2 NREL Stratus Cloud Computing
3 Competitive Positioning
3.1 HPC
3.2 Cloud Computing
3.3 Edge Computing
4 Hybrid Support of Real-Time Data Vision at Scale
4.1 Overview of the Workflow
4.2 Workflow Components and Positioning
5 Challenges to Supporting the Vision
6 Building Toward the Vision at NREL
References
Braid-DB: Toward AI-Driven Science with Machine Learning Provenance
1 Introduction
2 Background
2.1 Provenance Needs in AI for Science
2.2 Developments in Provenance Concepts for Machine Learning
2.3 Globus Flows
3 Approach
3.1 Contributions
3.2 Provenance Structure
4 Architecture
4.1 Software Performance Targets
4.2 Software Components
4.3 Software Implementation
5 Case Studies
5.1 Provenance Flow Capture for Training DNNs in X-Ray Science
5.2 Serial Synchrotron X-Ray Crystallography
5.3 The Mascot Workflow
6 Performance
7 Future Work
8 Conclusion
References
Lessons Learned on the Interface Between Quantum and Conventional Networking
1 Introduction
2 Generic Quantum-Conventional Network Harness
3 Scientific Use Case
3.1 Entanglement Distribution
3.2 Prototype Network Architecture
4 Deployed Network
4.1 Time Synchronization
4.2 Experimental Implementation
4.3 Bandwidth Allocation
5 Summary of Lessons Learned
6 Conclusion
References
Use It or Lose It: Cheap Compute Everywhere
1 Introduction
2 Motivation
2.1 Premium and Freemium
2.2 Data Movement
3 Hardware Design Points
3.1 FPGA
3.2 Many-Core
3.3 Cheap CPUs
3.4 ASICs and ASIC-Hybrid
4 Target Market and Use Cases
4.1 Opportunities Better Suited for Specialization Offload
4.2 Opportunities for Ancillary Offload
5 Open Questions
5.1 Balancing Power, Performance and Cost
5.2 The Right Level of Abstraction
5.3 The Memory Problem
5.4 Tying It All Together
6 Conclusions
References
Deploying Advanced Computing Platforms: On the Road to a Converged Ecosystem
Enabling ISO Standard Languages for Complex HPC Workflows
1 Introduction
1.1 The Role of Standards in HPC
1.2 Performance and Programming Models
2 ISO Standard C++ Parallelism for HPC Workloads
2.1 ISO C++ Today
2.2 Evolving ISO C++
3 ISO Standard Fortran Parallelism for HPC Workloads
3.1 ISO Fortran Today
3.2 Evolving ISO Fortran
4 Conclusion
References
Towards Standard Kubernetes Scheduling Interfaces for Converged Computing
1 Introduction
2 Technology Building Blocks
2.1 Flux
2.2 Kube-scheduler
2.3 Node Feature Discovery
3 A Design Space for Scheduler Composition
3.1 Looser Composition Provided by Custom Controllers
3.2 Tighter Composition Enabled by Scheduling Framework
3.3 Augmented Tight Composition Demands an API Extension
3.4 API Semantics Must Accommodate Fundamental Mismatches
4 KubeFlux Design and Challenges
4.1 Addressing Mismatches in Workloads and Resource Models
4.2 Resource Sharing
4.3 KubeFlux Plug-in Scheduler
5 Experimental Work
5.1 Overhead of KubeFlux over Flux
5.2 Overhead of KubeFlux over Kubernetes
6 Related Work
6.1 Cooperative Scheduling
6.2 Integrating HPC Schedulers in Container-Based Environments
7 Summary and Future Work
References
Scaling SQL to the Supercomputer for Interactive Analysis of Simulation Data
1 Introduction
2 RAPIDS and the BlazingSQL Software Architecture
3 Implementation of Communications via UCX in BlazingSQL
3.1 False Starts for Implementing the UCX API in an Application Code
3.2 Final Implementation of UCX Communications
4 Performance Results
4.1 Performance of UCX vs. IPoIB
4.2 Multi-node Performance
5 Implications for Future and Emerging HPC Platforms
6 Conclusion
References
NVIDIA's Cloud Native Supercomputing
1 Introduction
2 Solution: Cloud Native Supercomputing Design Principles
3 Implementation Principles and Technologies
3.1 Bluefield Data Processing Unit
3.2 Multi-tenant Isolation: Toward a Zero-Trust Architecture
3.3 Services
3.4 In-Network Computing: Offloading Capabilities
4 Summary
References
Scientific Data Challenges
Smoky Mountain Data Challenge 2021: An Open Call to Solve Scientific Data Challenges Using Advanced Data Analytics and Edge Computing
1 Introduction
2 Challenge 1: Unraveling Hidden Order and Dynamics in a Heterogeneous Ferroelectric System Using Machine Learning
2.1 Background
2.2 Dataset Description
2.3 Challenges of Interest
3 Challenge 2: Finding Novel Links in COVID-19 Knowledge Graph
3.1 Background
3.2 Introduction
3.3 Dataset Description
3.4 Challenges of Interest
4 Challenge 3: Synthetic-to-Real Domain Adaptation for Autonomous Driving
4.1 Background
4.2 Dataset Description
4.3 Challenges of Interest
5 Challenge 4: Analyzing Resource Utilization and User Behavior on Titan Supercomputer
5.1 Background
5.2 Dataset Description
5.3 Challenges of Interest
6 Challenge 5: Sustainable Cities: Socioeconomics, Building Types, and Urban Morphology
6.1 Background
6.2 Dataset Description
6.3 Challenges of Interest
7 Challenge 6: Where to Go in the Atomic World
7.1 Background
7.2 Dataset Description
7.3 Challenges of Interest
8 Challenge 7: Increased Image Spatial Resolution for Neutron Radiography
8.1 Background
8.2 Dataset Description
8.3 Challenges of Interest
9 Challenge 8: High Dimensional Active Learning for Microscopy of Nanoscale Materials
9.1 Background
9.2 Dataset Description
9.3 Challenges of Interest
10 Conclusions
References
Advanced Image Reconstruction for MCP Detector in Event Mode
1 Introduction
2 Image Reconstruction
2.1 Event Clustering
2.2 Incident Neutron Back-Tracing
3 Results
3.1 Clustering Results
3.2 Image Reconstruction
4 Discussion
4.1 Event Clustering Analysis
4.2 Image Reconstruction with Different Models
5 Summary
References
An Study on the Resource Utilization and User Behavior on Titan Supercomputer
1 Introduction
2 Exploratory Data Analysis
2.1 Data Preprocessing
2.2 Data Correlation
2.3 Data Clustering
3 Time Series Analysis
3.1 Seasonality
3.2 GPU Hardware-Related Issues
3.3 Predictive Model
4 Conclusions
References
Recurrent Multi-task Graph Convolutional Networks for COVID-19 Knowledge Graph Link Prediction
1 Introduction
2 Related Work
2.1 COVID-19 Knowledge Graphs
2.2 Temporal Link Prediction
3 Methodology
3.1 Problem Formation
3.2 Model
4 Experiments
4.1 Datasets
4.2 Preprocessing
4.3 Analysis
4.4 Baseline Models
4.5 Experimental Setup
4.6 Results
5 Conclusions
References
Reconstructing Piezoelectric Responses over a Lattice: Adaptive Sampling of Low Dimensional Time Series Representations Based on Relative Isolation and Gradient Size
1 Introduction and Background
2 Proposed Solution
3 Approach
4 Results
5 Contributions
References
Finding Novel Links in COVID-19 Knowledge Graph Using Graph Embedding Techniques
1 Introduction
2 Methodology
2.1 Dataset Curation
2.2 Data Visualization
2.3 Exploratory Data Analysis
2.4 Methods for Link Prediction
2.5 Importance of Predicted Links
3 Results and Discussions
4 Conclusion
References
Exploring the Spatial Relationship Between Demographic Indicators and the Built Environment of a City
1 Background
1.1 Introduction
1.2 Study Area and Data Sources
2 Methodology
2.1 Challenge 1: What is the Distribution of Commercial, Industrial, and Residential Buildings Within Each Block Group? Do These Distributions Correlate with Building Age? Building Value? Building Size?
2.2 Challenge 2: Using Temperature Data from a Source of the Participant’s Choosing, Are There Locations Within the City That Tend to Be Warmer Than Others? How Does This Relate to Building Density and Building Type?
2.3 Challenge 3: How Does the Built Environment and the Local Scale Experience of Heat Co-vary with Socio-economic and Demographic Characteristics of Residents?
3 Results
3.1 Challenge 1: Distribution of Building Types and Their Correlation with Building Characteristics (Age, Value and Size)
3.2 Challenge 2: Spatial Pattern of Temperature and Its Correlation with Building Type and Density
3.3 Challenge 3: Correlation Between Socio-economic and Demographic Characteristics and the Built Environment
4 Conclusion
Appendix
References
Atomic Defect Identification with Sparse Sampling and Deep Learning
1 Introduction
2 Background
3 Methods
4 Results
5 Conclusion
References
Author Index


📜 SIMILAR VOLUMES


Driving Scientific and Engineering Disco
✍ Jeffrey Nichols, Arthur ‘Barney’ Maccabe, James Nutaro, Swaroop Pophale, Pravall 📂 Library 📅 2022 🏛 Springer 🌐 English

<span>This book constitutes the revised selected papers of the 21st Smoky Mountains Computational Sciences and Engineering Conference, SMC 2021, held in Oak Ridge, TN, USA*, in October 2021.</span><p><span>The 33 full papers and 3 short papers presented were carefully reviewed and selected from a to

Accelerating Science and Engineering Dis
✍ Kothe Doug (editor), Geist Al (editor), Swaroop Pophale (editor), Hong Liu (edit 📂 Library 📅 2023 🏛 Springer 🌐 English

<span>This book constitutes the refereed proceedings of the 22nd Smoky Mountains Computational Sciences and Engineering Conference on Accelerating Science and Engineering Discoveries Through Integrated Research Infrastructure for Experiment, Big Data, Modeling and Simulation, SMC 2022, held virtuall

Driving Scientific and Engineering Disco
✍ Jeffrey Nichols (editor), Becky Verastegui (editor), Arthur ‘Barney’ Maccabe (ed 📂 Library 📅 2020 🏛 Springer 🌐 English

<span>This book constitutes the revised selected papers of the 17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020, held in Oak Ridge, TN, USA*, in August 2020.</span><p><span>The 36 full papers and 1 short paper presented were carefully reviewed and selected from a tota

Model-Based Software and Data Integratio
✍ Ralf-Detlef Kutsche, Nikola Milanovic 📂 Library 📅 2008 🌐 English

This book includes selected papers of the First International Workshop on Model-Based Software and Data Integration 2008, held in Berlin, Germany, in April 2008 as a part of the Berlin Software Integration Week 2008. The 9 revised full papers presented together with 3 invited lectures were carefull

Scientific Modeling and Simulations (Lec
✍ Sidney Yip, Sidney Yip, Tomas Diaz de la Rubia 📂 Library 📅 2009 🏛 Springer 🌐 English

<P>The conceptualization of a problem (modeling) and the computational solution of this problem (simulation), is the foundation of Computational Science. This coupled endeavor is unique in several respects. It allows practically any complex system to be analyzed with predictive capability by invokin

Scientific Modeling and Simulations (Lec
✍ Sidney Yip 📂 Library 📅 2009 🏛 Springer 🌐 English

The conceptualization of a problem (modeling) and the computational solution of this problem (simulation), is the foundation of Computational Science. This coupled endeavor is unique in several respects. It allows practically any complex system to be analyzed with predictive capability by invoking t