𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Towards large-scale multi-socket, multicore parallel simulations: Performance of an MPI-only semiconductor device simulator

✍ Scribed by Paul T. Lin; John N. Shadid


Publisher
Elsevier Science
Year
2010
Tongue
English
Weight
525 KB
Volume
229
Category
Article
ISSN
0021-9991

No coin nor oath required. For personal study only.

✦ Synopsis


This preliminary study considers the scaling and performance of a finite element (FE) semiconductor device simulator on a set of multi-socket, multicore architectures with nonuniform memory access (NUMA) compute nodes. These multicore architectures include two linux clusters with multicore processors: a quad-socket, quad-core AMD Opteron platform and a dual-socket, quad-core Intel Xeon Nehalem platform; and a dual-socket, six-core AMD Opteron workstation. These platforms have complex memory hierarchies that include local core-based cache, local socket-based memory, access to memory on the same mainboard from another socket, and then memory across network links to different nodes. The specific semiconductor device simulator used in this study employs a fully-coupled Newton-Krylov solver with domain decomposition and multilevel preconditioners. Scaling results presented include a large-scale problem of 100+ million unknowns on 4096 cores and a comparison with the Cray XT3/4 Red Storm capability platform. Although the MPIonly device simulator employed for this work can take advantage of all the cores of quad-core and six-core CPUs, the efficiency of the linear system solve is decreasing with increased core count and eventually a different programming paradigm will be needed.