TY - JOUR
T1 - PGAS-FMM
T2 - Implementing a distributed fast multipole method using the X10 programming language
AU - Milthorpe, Josh
AU - Rendell, Alistair P.
AU - Huber, Thomas
PY - 2014/3/10
Y1 - 2014/3/10
N2 - The fast multipole method (FMM) is a complex, multi-stage algorithm over a distributed tree data structure, with multiple levels of parallelism and inherent data locality. X10 is a modern partitioned global address space language with support for asynchronous activities. The parallel tasks comprising FMM may be expressed in X10 by using a scalable pattern of activities. This paper demonstrates the use of X10 to implement FMM for simulation of electrostatic interactions between ions in a cyclotron resonance mass spectrometer. X10's task-parallel model is used to express parallelism by using a pattern of activities mapping directly onto the tree. X10's work stealing runtime handles load balancing fine-grained parallel activities, avoiding the need for explicit work sharing. The use of global references and active messages to create and synchronize parallel activities over a distributed tree structure is also demonstrated. In contrast to previous simulations of ion trajectories in cyclotron resonance mass spectrometers, our code enables both simulation of realistic particle numbers and guaranteed error bounds. Single-node performance is comparable with the fastest published FMM implementations, and critical expansion operators are faster for high accuracy calculations. A comparison of parallel and sequential codes shows the overhead of activity management and work stealing in this application is low. Scalability is evaluated for 8k cores on a Blue Gene/Q system and 512 cores on a Nehalem/InfiniBand cluster.
AB - The fast multipole method (FMM) is a complex, multi-stage algorithm over a distributed tree data structure, with multiple levels of parallelism and inherent data locality. X10 is a modern partitioned global address space language with support for asynchronous activities. The parallel tasks comprising FMM may be expressed in X10 by using a scalable pattern of activities. This paper demonstrates the use of X10 to implement FMM for simulation of electrostatic interactions between ions in a cyclotron resonance mass spectrometer. X10's task-parallel model is used to express parallelism by using a pattern of activities mapping directly onto the tree. X10's work stealing runtime handles load balancing fine-grained parallel activities, avoiding the need for explicit work sharing. The use of global references and active messages to create and synchronize parallel activities over a distributed tree structure is also demonstrated. In contrast to previous simulations of ion trajectories in cyclotron resonance mass spectrometers, our code enables both simulation of realistic particle numbers and guaranteed error bounds. Single-node performance is comparable with the fastest published FMM implementations, and critical expansion operators are faster for high accuracy calculations. A comparison of parallel and sequential codes shows the overhead of activity management and work stealing in this application is low. Scalability is evaluated for 8k cores on a Blue Gene/Q system and 512 cores on a Nehalem/InfiniBand cluster.
KW - X10
KW - active messages
KW - fast multipole method
KW - parallel programming models
KW - partitioned global address space (PGAS)
KW - scientific computing
UR - http://www.scopus.com/inward/record.url?scp=84893942646&partnerID=8YFLogxK
U2 - 10.1002/cpe.3039
DO - 10.1002/cpe.3039
M3 - Article
SN - 1532-0626
VL - 26
SP - 712
EP - 727
JO - Concurrency Computation Practice and Experience
JF - Concurrency Computation Practice and Experience
IS - 3
ER -