TY - JOUR
T1 - Multi-GPU RI-HF Energies and Analytic Gradients─Toward High-Throughput Ab Initio Molecular Dynamics
AU - Stocks, Ryan
AU - Palethorpe, Elise
AU - Barca, Giuseppe M.J.
N1 - Publisher Copyright:
© 2024 American Chemical Society.
PY - 2024/9/10
Y1 - 2024/9/10
N2 - This article presents an optimized algorithm and implementation for calculating resolution-of-the-identity Hartree-Fock (RI-HF) energies and analytic gradients using multiple graphics processing units (GPUs). The algorithm is especially designed for high throughput ab initio molecular dynamics simulations of small and medium size molecules (10-100 atoms). Key innovations of this work include the exploitation of multi-GPU parallelism and a workload balancing scheme that efficiently distributes computational tasks among GPUs. Our implementation also employs techniques for symmetry utilization, integral screening, and leveraging sparsity to optimize memory usage. Computational results show that the implementation achieves significant performance improvements, including over 3 × speedups in single GPU AIMD throughput compared to previous GPU-accelerated RI-HF and traditional HF methods. Furthermore, utilizing multiple GPUs can provide superlinear speedup when the additional aggregate GPU memory allows for the storage of decompressed three-center integrals.
AB - This article presents an optimized algorithm and implementation for calculating resolution-of-the-identity Hartree-Fock (RI-HF) energies and analytic gradients using multiple graphics processing units (GPUs). The algorithm is especially designed for high throughput ab initio molecular dynamics simulations of small and medium size molecules (10-100 atoms). Key innovations of this work include the exploitation of multi-GPU parallelism and a workload balancing scheme that efficiently distributes computational tasks among GPUs. Our implementation also employs techniques for symmetry utilization, integral screening, and leveraging sparsity to optimize memory usage. Computational results show that the implementation achieves significant performance improvements, including over 3 × speedups in single GPU AIMD throughput compared to previous GPU-accelerated RI-HF and traditional HF methods. Furthermore, utilizing multiple GPUs can provide superlinear speedup when the additional aggregate GPU memory allows for the storage of decompressed three-center integrals.
UR - http://www.scopus.com/inward/record.url?scp=85202672396&partnerID=8YFLogxK
U2 - 10.1021/acs.jctc.4c00877
DO - 10.1021/acs.jctc.4c00877
M3 - Article
C2 - 39192710
AN - SCOPUS:85202672396
SN - 1549-9618
VL - 20
SP - 7503
EP - 7515
JO - Journal of Chemical Theory and Computation
JF - Journal of Chemical Theory and Computation
IS - 17
ER -