TY - GEN
T1 - Handling silent data corruption with the sparse grid combination technique
AU - Parra Hinojosa, Alfredo
AU - Harding, Brendan
AU - Hegland, Markus
AU - Bungartz, Hans Joachim
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2016.
PY - 2016
Y1 - 2016
N2 - We describe two algorithms to detect and filter silent data corruption (SDC) when solving time-dependent PDEs with the Sparse Grid Combination Technique (SGCT). The SGCT solves a PDE on many regular full grids of different resolutions, which are then combined to obtain a high quality solution. The algorithm can be parallelized and run on large HPC systems. We investigate silent data corruption and show that the SGCT can be used with minor modifications to filter corrupted data and obtain good results. We apply sanity checks before combining the solution fields to make sure that the data is not corrupted. These sanity checks are derived from well-known error bounds of the classical theory of the SGCT and do not rely on checksums or data replication. We apply our algorithms on a 2D advection equation and discuss the main advantages and drawbacks.
AB - We describe two algorithms to detect and filter silent data corruption (SDC) when solving time-dependent PDEs with the Sparse Grid Combination Technique (SGCT). The SGCT solves a PDE on many regular full grids of different resolutions, which are then combined to obtain a high quality solution. The algorithm can be parallelized and run on large HPC systems. We investigate silent data corruption and show that the SGCT can be used with minor modifications to filter corrupted data and obtain good results. We apply sanity checks before combining the solution fields to make sure that the data is not corrupted. These sanity checks are derived from well-known error bounds of the classical theory of the SGCT and do not rely on checksums or data replication. We apply our algorithms on a 2D advection equation and discuss the main advantages and drawbacks.
UR - http://www.scopus.com/inward/record.url?scp=84989930019&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-40528-5_9
DO - 10.1007/978-3-319-40528-5_9
M3 - Conference contribution
SN - 9783319405261
T3 - Lecture Notes in Computational Science and Engineering
SP - 165
EP - 186
BT - Software for Exascale Computing - SPPEXA 2013-2015
A2 - Nagel, Wolfgang E.
A2 - Bungartz, Hans-Joachim
A2 - Neumann, Philipp
PB - Springer Verlag
T2 - International Conference on Software for Exascale Computing, SPPEXA 2015
Y2 - 25 January 2016 through 27 January 2016
ER -