Fast matrix multiplication
โ Scribed by Carlos F. Bunge; Gerardo Cisneros
- Publisher
- John Wiley and Sons
- Year
- 1987
- Tongue
- English
- Weight
- 358 KB
- Volume
- 8
- Category
- Article
- ISSN
- 0192-8651
No coin nor oath required. For personal study only.
โฆ Synopsis
Several implementations of matrix multiplication (MMUL) in Fortran and VAX assembly language are discussed. On a VAX-11/780 computer, the most efficient MMUL is achieved through vector-scalarmultiply-and-add (VSMA) operations, rather than by means of dot products. We also discuss optimal MMUL algorithms for use in virtual memory machines when the data overflow the working set.
๐ SIMILAR VOLUMES
In the paper we give a straightforward, highly efficient, scalable implementation of common matrix multiplication operations. The algorithms are much simpler than previously published methods, yield better performance, and require less work space. MPI implementations are given, as are performance re
The purpose of this paper is to present an algorithm for matrix multiplication based on a formula discovered by Pan [7]. For matrices of order up to 10 000, the nearly optimum tuning of the algorithm results in a rather clear non-recursive one-or two-level structure with the operation count comparab
Many pattern recognition tasks, including estimation, classification, and the finding of similar objects, make use of linear models. The fundamental operation in such tasks is the computation of the dot product between a query vector and a large database of instance vectors. Often we are interested
In this paper we construct an analytic model of cache misses during matrix multiplication. The analysis in this paper applies to square matrices of size 2 m where the array layout function is given in terms of a function 3 that interleaves the bits in the binary expansions of the row and column indi
Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N a ), where 2 < a [ 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using N a /log N processors. Such a parallel