[ACM Press the eleventh ACM SIGPLAN symposium - New York, New York, USA (2006.03.29-2006.03.31)] Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '06 - Predicting bounds on queuing delay for batch-scheduled parallel machines
โ Scribed by Brevik, John; Nurmi, Daniel; Wolski, Rich
- Book ID
- 121405498
- Publisher
- ACM Press
- Year
- 2006
- Weight
- 411 KB
- Category
- Article
- ISBN-13
- 9781595931894
No coin nor oath required. For personal study only.
โฆ Synopsis
Most space-sharing parallel computers presently operated by highperformance computing centers use batch-queuing systems to manage processor allocation. In many cases, users wishing to use these batch-queued resources have accounts at multiple sites and have the option of choosing at which site or sites to submit a parallel job. In such a situation, the amount of time a user's job will wait in any one batch queue can significantly impact the overall time a user waits from job submission to job completion. In this work, we explore a new method for providing end-users with predictions for the bounds on the queuing delay individual jobs will experience. We evaluate this method using batch scheduler logs for distributedmemory parallel machines that cover a 9-year period at 7 large HPC centers.Our results show that it is possible to predict delay bounds reliably for jobs in different queues, and for jobs requesting different ranges of processor counts. Using this information, scientific application developers can intelligently decide where to submit their parallel codes in order to minimize overall turnaround time.
๐ SIMILAR VOLUMES
In 2002, we first brought High Performance Computing (HPC) methods to the college classroom as a way to enrich Computational Science education. Through the years, we have continued to facilitate college faculty in science, technology, engineering, and mathematics (STEM) disciplines to stay current w