๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

Ultrahigh-performance FFTs for the CRAY-2 and CRAY Y-MP supercomputers

โœ Scribed by David A. Carlson


Publisher
Springer US
Year
1992
Tongue
English
Weight
568 KB
Volume
6
Category
Article
ISSN
0920-8542

No coin nor oath required. For personal study only.

โœฆ Synopsis


In this paper a set of techniques for improving the performance of the fast Fourier transform (FFT) algorithm on modern vector-oriented supercomputers is presented. Single-processor FFT implementations based on these techniques are developed for the CRAY-2 and the CRAY Y-MP, and it is shown that they achieve higher performance than previously measured on these machines. The techniques include (1) using gather/scatter operations to maintain optimum length vectors throughout all stages of small-to medium-sized FFTs, (2) using efficient radix-8 and radix-16 inner loops, which allow a large number of vector loads/stores to be overlapped, and (3) prefetching twiddle factors as vectors so that on the CRAY-2 they can later be fetched from local memory in parallel with common memory accesses. Performance results for Fortran implementations using these techniques demonstrate that they are faster than Cray's library FFT routine CFFT2. The actual speedups obtained, which depend on the size of the FFT being computed and the supercomputer being used, range from about 5 to over 300 %.


๐Ÿ“œ SIMILAR VOLUMES


Vector performance estimation for CRAY X
โœ Allen R. Hainline; Steven R. Thompson; Lawrence L. Halcomb ๐Ÿ“‚ Article ๐Ÿ“… 1992 ๐Ÿ› Springer US ๐ŸŒ English โš– 931 KB

Optimization of vector-intensive applications for the CRAY X-MP/Y-MP often requires arranging the operations to take full advantage of such architectural features as the memory system, independent memory ports, chaining, and independent functional units. Estimation of performance is not straightforw

Performance comparison of the CRAY-2 and
โœ Margaret L. Simmons; Harvey J. Wasserman ๐Ÿ“‚ Article ๐Ÿ“… 1990 ๐Ÿ› Springer US ๐ŸŒ English โš– 702 KB

The serial and parallel performance of one of the world's fastest general purpose computers, the CRAY-2, is analyzed using the standard Los Alamos Benchmark Set plus codes adapted for parallel processing. For comparison, architectural and performance data are also given for the CRAY X-MP/416. Factor

Performance comparison of the CRAY X-MP/
โœ Richard E. Anderson; Roger G. Grimes; Horst D. Simon ๐Ÿ“‚ Article ๐Ÿ“… 1988 ๐Ÿ› Springer US ๐ŸŒ English โš– 586 KB

The CRAY-2 is considered to be one of the most powerful supercomputers. Its state-of-the-art technology features a faster clock and more memory than any other supercomputer available today. In this report the single processor performance of the CRAY-2 is compared with the older, more mature CRAY X-M

Using local memory to boost the performa
โœ David A. Carlson ๐Ÿ“‚ Article ๐Ÿ“… 1991 ๐Ÿ› Springer US ๐ŸŒ English โš– 695 KB

local memory. This is in addition to the extremely large, 268-million-word common memory that is accessible by all four processors. By using local memory judiciously, it is possible to achieve increased performance on the CRAY-2. This is partly because accesses to local memory can be done simultaneo