๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

[ACM Press the 21st international symposium - Delft, The Netherlands (2012.06.18-2012.06.22)] Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing - HPDC '12 - Optimizing MapReduce for GPUs with effective shared memory usage

โœ Scribed by Chen, Linchuan; Agrawal, Gagan


Book ID
118239846
Publisher
ACM Press
Year
2012
Weight
595 KB
Category
Article
ISBN
1450308058

No coin nor oath required. For personal study only.

โœฆ Synopsis


Accelerators and heterogeneous architectures in general, and GPUs in particular, have recently emerged as major players in high performance computing. For many classes of applications, MapReduce has emerged as the framework for easing parallel programming and improving programmer productivity. There have already been several efforts on implementing MapReduce on GPUs.In this paper, we propose a new implementation of MapReduce for GPUs, which is very effective in utilizing shared memory, a small programmable cache on modern GPUs. The main idea is to use a reduction-based method to execute a MapReduce application. The reduction-based method allows us to carry out reductions in shared memory. To support a general and efficient implementation, we support the following features: a memory hierarchy for maintaining the reduction object, a multi-group scheme in shared memory to trade-off space requirements and locking overheads, a general and efficient data structure for the reduction object, and an efficient swapping mechanism.We have evaluated our framework with seven commonly used MapReduce applications and compared it with the sequential implementations, MapCG, a recent MapReduce implementation on GPUs, and Ji et al.'s work, a recent MapReduce implementation that utilizes shared memory in a different way. The main observations from our experimental results are as follows. For four of the seven applications that can be considered as reduction-intensive applications, our framework has a speedup of between 5 and 200 over MapCG (for large datasets). Similarly, we achieved a speedup of between 2 and 60 over Ji et al.'s work.


๐Ÿ“œ SIMILAR VOLUMES


[ACM Press the 21st international sympos
โœ Tan, Jian; Meng, Xiaoqiao; Zhang, Li ๐Ÿ“‚ Article ๐Ÿ“… 2012 ๐Ÿ› ACM Press โš– 383 KB

Current schedulers of MapReduce/Hadoop are quite successful in providing good performance. However improving spaces still exist: map and reduce tasks are not jointly optimized for scheduling, albeit there is a strong dependence between them. This can cause job starvation and bad data locality. We de