TY - JOUR
T1 - A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization
AU - Strazdins, P. E.
PY - 2001
Y1 - 2001
N2 - This article analyses and compares the techniques of algorithmic blocking and storage blocking with lookahead for distributed memory LU, LLT, and QR factorizations. Concepts and some useful properties of a simplified model of lookahead are explored. Issues in the implementation of lookahead are discussed, which are more involved for the case of LLT and QR factorizations. The article also explains how hybrid algorithmic blocking and lookahead techniques can be implemented. Results, given on the Fujitsu AP1000 and AP+ multicomputers, indicate that both methods are superior to storage blocking, and that the hybrid method is optimal for smaller matrices, due to savings in communication startups. For larger matrices, algorithmic blocking gave the best performance (excepting LLT for the AP+), due to its better load-balancing properties. Performance models, predicting the minimum matrix size where lookahead becomes effective, indicate this trend can be expected for machines with lower communication-to-computation speeds, but that the range for where lookahead is superior is extended.
AB - This article analyses and compares the techniques of algorithmic blocking and storage blocking with lookahead for distributed memory LU, LLT, and QR factorizations. Concepts and some useful properties of a simplified model of lookahead are explored. Issues in the implementation of lookahead are discussed, which are more involved for the case of LLT and QR factorizations. The article also explains how hybrid algorithmic blocking and lookahead techniques can be implemented. Results, given on the Fujitsu AP1000 and AP+ multicomputers, indicate that both methods are superior to storage blocking, and that the hybrid method is optimal for smaller matrices, due to savings in communication startups. For larger matrices, algorithmic blocking gave the best performance (excepting LLT for the AP+), due to its better load-balancing properties. Performance models, predicting the minimum matrix size where lookahead becomes effective, indicate this trend can be expected for machines with lower communication-to-computation speeds, but that the range for where lookahead is superior is extended.
KW - Algorithmic blocking
KW - Block cyclic decomposition
KW - Dense linear algebra
KW - Lookahead
KW - Matrix factorization
KW - Pipelined communication
UR - http://www.scopus.com/inward/record.url?scp=0035003299&partnerID=8YFLogxK
M3 - Article
SN - 1206-2138
VL - 4
SP - 26
EP - 35
JO - International Journal of Parallel and Distributed Systems and Networks
JF - International Journal of Parallel and Distributed Systems and Networks
IS - 1
ER -