Abstract
One of the challenges for efficiently and effectively using petascale and exascale computers is the handling of run-time errors. Without such robustness, applications developed for these machines will have little chance of completing successfully. The sparse grid combination technique approximates the solution to a given problem by taking the linear combination of its solution on multiple grids. It is successful in many high performance computing applications due to its ability to tackle the curse of dimensionality. We present several approaches to fault tolerance using the combination technique. The first of these is implemented within the MapReduce model in order to utilise the existing fault tolerance of this framework. In addition, we present a method which utilises the redundancy shared by solutions on different grids. Finally, we describe a novel approach in which the solution is computed on additional grids which are used for alternative combinations if other grids experience failure. We include some results based on the solution of the 2D scalar advection pde.
Original language | English |
---|---|
Pages (from-to) | C394-C411 |
Journal | ANZIAM Journal |
Volume | 54 |
Issue number | SUPPL |
Publication status | Published - 2012 |