#### Abstract

A large number of problems in numerical analysis require the multiplication of a sparse matrix by a vector. In spite of the large amount of fine-grained parallelism available in the process of sparse matrix-vector multiplication, it is difficult to design an algorithm for distributed memory SIMD computers that can efficiently multiply an arbitrary sparse matrix by a vector. The difficulty lies in the irregular nature of the data structures required to efficiently store arbitrary sparse matrices, and the architectural constraints of a SIMD computer. We propose a new algorithm that allows the "regularity" of a data structure that uses a row-major mapping to be varied by a changing a parameter (the "block size"). The (block row) algorithm assumes that the number of non-zero elements in each row is a multiple of the blocksize; (additional) zero entries are stored to satisfy this condition. The blocksize can be varied from one to N, where N is the size of the matrix; a blocksize of one results in a rcw-major distribution of the non-zero elements of the matrix (no oveahead of storing zcxo elements), while a blocksize of N results in a row-major distribution corresponding to that of a dense matrix. The algorithm was irnplemerlted on a 16,384 processor MasPar MP-1, and for the matrices associated with ithe applications considered here (S-Matrix Approach to Device Simulation, and tlhe Modeling of Diffractive and Scattering Objects), the algorithm was faster than ainy of the other algorithms considered (the "snake-like" method, the "segmented-scan" method, and a randomized packing algorithm). For matrices that have a wide variation in the number of non-zero elements in each row, a procedure for an "adaptive" block row allgorithrn is briefly mentioned. The block row algorithm is applicable to unstructured sllarse matrices which have relatively sparse columns (dense rows arc: not a problem), and it can be implemented on any distributed memory computer.

#### Date of this Version

October 1994