We present a new, simple algorithmic idea for the collective communication operations broadcast, reduction, and scan (prefix sums). The algorithms concurrently communicate over two binary trees which both span the entire network. By careful layout and communication scheduling, each tree communicates
A bandwidth latency tradeoff for broadcast and reduction
โ Scribed by Peter Sanders; Jop F Sibeyn
- Publisher
- Elsevier Science
- Year
- 2003
- Tongue
- English
- Weight
- 109 KB
- Volume
- 86
- Category
- Article
- ISSN
- 0020-0190
No coin nor oath required. For personal study only.
โฆ Synopsis
The "fractional tree" algorithm for broadcasting and reduction is introduced. Its communication pattern interpolates between two well known patterns-sequential pipeline and pipelined binary tree. The speedup over the best of these simple methods can approach two for large systems and messages of intermediate size. For networks which are not very densely connected the new algorithm seems to be the best known method for the important case that each processor has only a single (possibly bidirectional) channel into the communication network.
๐ SIMILAR VOLUMES
This study measures the effects of changes in message latency and bandwidth for productionlevel codes on a current generation tightly coupled MPP, the Intel Paragon. Messages are sent multiple times to study the application sensitivity to variations in bandwidth and latency. This method preserves th