✦ LIBER ✦

Scalable Parallel Coset Enumeration: Bulk Definition and the Memory Wall

✍ Scribed by Gene Cooperman; Victor Grinberg

Publisher: Elsevier Science
Year: 2002
Tongue: English
Weight: 340 KB
Volume: 33
Category: Article
ISSN: 0747-7171
DOI: 10.1006/jsco.2002.0523

No coin nor oath required. For personal study only.

✦ Synopsis

Coset enumeration, like Gröbner bases, is a notoriously difficult algorithm to parallelize. We demonstrate a successful shared memory parallelization achieving a seven times speedup on an Origin 2000 CC-NUMA computer using 16 CPUs. We take as a testbed, an enumeration of Lyons's group (8 835 156 cosets). This provides comparability with previous efforts in the literature for which the best previous speedup was a factor of 4. The new parallelization depends on two new heuristics, clouds and shallow scan. Clouds is an example of bulk definition of cosets, which forms the key to our more efficient parallelization. The parallelization is implemented using TOP-C. By taking advantage of TOP-C's option to compile for either shared or distributed memory, we also demonstrate the first efficient parallelization of a coset enumeration program using distributed memory.

Our faster results expose for the first time in the context of coset enumeration the "memory wall", i.e. the latency barrier of the RAM. We verify this memory wall by showing on an Origin 2000 that a memory-bound parallel program with 64 CPUs doing nothing but randomly accessing RAM achieves at best a speedup of only 3.75 over the single-CPU version. Further, it is demonstrated that even sequential versions of coset enumerations programs are memory latency-bound in today's technology. The lessons of bulk definition and of the memory wall carry over to related algorithms such as Gröbner bases, Knuth-Bendix, and other symbolic algebra algorithms with intermediate swell.