Power-performance considerations of parallel computing on chip multiprocessors
✍ Scribed by Li, Jian; Martínez, José F.
- Book ID
- 120689152
- Publisher
- Association for Computing Machinery
- Year
- 2005
- Tongue
- English
- Weight
- 553 KB
- Volume
- 2
- Category
- Article
- ISSN
- 1544-3566
No coin nor oath required. For personal study only.
✦ Synopsis
This paper looks at the power-performance implications of running parallel applications on chip multiprocessors (CMPs). First, we develop an analytical model that, for the first time, puts together parallel efficiency, granularity of parallelism, and voltage/frequency scaling, to establish a formal connection with the power consumption and performance of a parallel code running on a CMP. We then conduct detailed simulations of parallel applications running on a detailed power-performance CMP model to confirm the analytical results and provide further insights. Both analytical and experimental models show that parallel computing can bring significant power savings and still meet a given performance target by choosing granularity and voltage/frequency levels judiciously. The particular choice, however, is dependent on the application's parallel efficiency curve and the process technology utilized, which our model captures. Likewise, analytical model and experiments show the effect of a limited power budget on the application's scalability curve. In particular, we show that a limited power budget can cause a rapid performance degradation beyond a number of cores, even in the case of applications with excellent scalability properties. On the other hand, our experiments show that, when a limited power budget is in place, power-thrifty memory-bound applications may actually enjoy better scalability than more compute-intensive codes, even if the latter would exhibit higher scalability in a power-unconstrained scenario.
📜 SIMILAR VOLUMES
## Abstract Recent semiconductor technology has made on‐chip multiprocessors with several CPUs and cache memories on a single chip a realistic possibility. Generally, conventional multiprocessor systems with shared memory offer a simple programming model, but need a cache coherency mechanism that m