TY - JOUR
T1 - Parallel MCGLS and ICGLS methods for least squares problems on distributed memory architectures
AU - Yang, Laurence Tianruo
AU - Brent, Richard P.
PY - 2004/8
Y1 - 2004/8
N2 - In this paper we mainly study the parallelization of the CGLS method, a basic iterative method for large and sparse least squares problems in which the conjugate gradient method is applied to solve normal equations. On modern parallel architectures its parallel performance is always limited because of the global communication required for inner products, the main bottleneck of parallel performance. In this paper, we describe a modified COLS (MCGLS) method which improve parallel performance by assembling the results of a number of inner products collectively and by creating situations where communication can be overlapped with computation. More importantly, we also propose an improved CGLS (ICGLS) method to reduce inner product's global synchronization points to half, then significantly improve the parallel performance accordingly compared with the standard CGLS method and the MCGLS method.
AB - In this paper we mainly study the parallelization of the CGLS method, a basic iterative method for large and sparse least squares problems in which the conjugate gradient method is applied to solve normal equations. On modern parallel architectures its parallel performance is always limited because of the global communication required for inner products, the main bottleneck of parallel performance. In this paper, we describe a modified COLS (MCGLS) method which improve parallel performance by assembling the results of a number of inner products collectively and by creating situations where communication can be overlapped with computation. More importantly, we also propose an improved CGLS (ICGLS) method to reduce inner product's global synchronization points to half, then significantly improve the parallel performance accordingly compared with the standard CGLS method and the MCGLS method.
KW - CGLS
KW - Global communication
KW - Inner products
KW - Large and sparse matrices
KW - Least squares problems
KW - MCGLS and ICGLS methods
KW - Parallel distributed memory architectures
UR - http://www.scopus.com/inward/record.url?scp=3543075725&partnerID=8YFLogxK
U2 - 10.1023/B:SUPE.0000026847.75355.69
DO - 10.1023/B:SUPE.0000026847.75355.69
M3 - Article
SN - 0920-8542
VL - 29
SP - 145
EP - 156
JO - Journal of Supercomputing
JF - Journal of Supercomputing
IS - 2
ER -