Abstract
This paper deals with load balancing of parallel algorithms for distributed-memory computers. The parallel versions of BLAS subroutines for matrix-vector product and LU factorization are considered. Two task partitioning algorithms are investigated and speed-ups are calculated. The cases of homogeneous and heterogeneous collections of computers/processors are studied, and special partitioning algorithms for heterogeneous workstation clusters are presented.
Get full access to this article
View all access options for this article.
