TY - GEN
T1 - The improved Krylov subspace methods for large and sparse linear systems on bulk synchronous parallel architectures
AU - Yang, Laurence Tianruo
AU - Brent, Richard P.
N1 - Publisher Copyright:
© 2003 IEEE.
PY - 2003
Y1 - 2003
N2 - In this paper, we would like to summarize the recent advances on the improved Krylov subspace methods for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices. The proposed methods combine elements of numerical stability and parallel algorithm design without increasing much computational costs. The methods have the following common feature that all are derived such that all matrix-vector multiplication, inner products and vector updates of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. Therefore, the cost of global communication which represents the bottleneck of the performance can be significantly reduced. Here, the bulk synchronous parallel (BSP) model is used to design fully efficient, scalable and portable parallel proposed algorithms and to provide accurate performance prediction of the algorithms for a wide range of architectures including the Cray T3D, the Parsytec, and a cluster of workstations connected by an Ethernet. This performance model uses only a few system dependent parameters based on a simple and accurate cost modelling to provide useful insight in the time complexity of the method. The theoretical performance predictions are compared with some preliminary measured timing results of a numerical application from ocean flow simulation.
AB - In this paper, we would like to summarize the recent advances on the improved Krylov subspace methods for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices. The proposed methods combine elements of numerical stability and parallel algorithm design without increasing much computational costs. The methods have the following common feature that all are derived such that all matrix-vector multiplication, inner products and vector updates of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. Therefore, the cost of global communication which represents the bottleneck of the performance can be significantly reduced. Here, the bulk synchronous parallel (BSP) model is used to design fully efficient, scalable and portable parallel proposed algorithms and to provide accurate performance prediction of the algorithms for a wide range of architectures including the Cray T3D, the Parsytec, and a cluster of workstations connected by an Ethernet. This performance model uses only a few system dependent parameters based on a simple and accurate cost modelling to provide useful insight in the time complexity of the method. The theoretical performance predictions are compared with some preliminary measured timing results of a numerical application from ocean flow simulation.
UR - http://www.scopus.com/inward/record.url?scp=84947203512&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2003.1213473
DO - 10.1109/IPDPS.2003.1213473
M3 - Conference contribution
T3 - Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2003
BT - Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2003
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - International Parallel and Distributed Processing Symposium, IPDPS 2003
Y2 - 22 April 2003 through 26 April 2003
ER -