TY - GEN
T1 - Parallelisation of the valgrind dynamic binary instrumentation framework
AU - Robson, Daniel
AU - Strazdins, Peter
PY - 2008
Y1 - 2008
N2 - Valgrind is a dynamic binary translation and instrumentation framework. It is suited to analysing memory usage. It is used in memory validation and profiling tools. Currently, Valgrind is restricted to executing a guest with serialised thread scheduling. This results in lost opportunity for performance when analysing highly parallel applications on parallel architectures. We have extended the framework to allow parallel execution of guest threads. Code caching mechanisms have been made thread-safe, by delaying flushing of translated code, while preserving critical areas of performance. Three methods which preserve atomicity of instructions are implemented and evaluated with respect to speed, reliability and instrumentation effects. Serialising both store and atomic operations preserves atomicity in the strongest sense, but suffers unacceptable performance overhead. Serialising only atomic instructions or utilising host atomic instructions provides speedup in line with native execution. These methods show average slowdowns of only 2.6× and 2.2× over native parallel execution respectively.
AB - Valgrind is a dynamic binary translation and instrumentation framework. It is suited to analysing memory usage. It is used in memory validation and profiling tools. Currently, Valgrind is restricted to executing a guest with serialised thread scheduling. This results in lost opportunity for performance when analysing highly parallel applications on parallel architectures. We have extended the framework to allow parallel execution of guest threads. Code caching mechanisms have been made thread-safe, by delaying flushing of translated code, while preserving critical areas of performance. Three methods which preserve atomicity of instructions are implemented and evaluated with respect to speed, reliability and instrumentation effects. Serialising both store and atomic operations preserves atomicity in the strongest sense, but suffers unacceptable performance overhead. Serialising only atomic instructions or utilising host atomic instructions provides speedup in line with native execution. These methods show average slowdowns of only 2.6× and 2.2× over native parallel execution respectively.
UR - http://www.scopus.com/inward/record.url?scp=60649113034&partnerID=8YFLogxK
U2 - 10.1109/ISPA.2008.94
DO - 10.1109/ISPA.2008.94
M3 - Conference contribution
SN - 9780769534718
T3 - Proceedings of the 2008 International Symposium on Parallel and Distributed Processing with Applications, ISPA 2008
SP - 113
EP - 121
BT - Proceedings of the 2008 International Symposium on Parallel and Distributed Processing with Applications, ISPA 2008
T2 - 2008 International Symposium on Parallel and Distributed Processing with Applications, ISPA 2008
Y2 - 10 December 2008 through 12 December 2008
ER -