A C++11 implementation of arbitrary-rank tensors for high-performance computing
✍ Scribed by Aragón, Alejandro M.
- Book ID
- 126954446
- Publisher
- Elsevier Science
- Year
- 2014
- Tongue
- English
- Weight
- 296 KB
- Volume
- 185
- Category
- Article
- ISSN
- 0010-4655
No coin nor oath required. For personal study only.
📜 SIMILAR VOLUMES
The quality of compiler-optimized code for high-performance applications is far behind what optimization and domain experts can achieve by hand. Although it may seem surprising at first glance, the performance gap has been widening over time, due to the tremendous complexity increase in microprocess
In this paper we present a thorough experience on tuning double-precision matrix-matrix multiplication (DGEM-M) on the Fermi GPU architecture. We choose an optimal algorithm with blocking in both shared memory and registers to satisfy the constraints of the Fermi memory hierarchy. Our optimization s