Abstract
Work-stealing is a promising approach for effectively exploiting software parallelism on parallel hardware. A programmer who uses work-stealing explicitly identifies potential parallelism and the runtime then schedules work, keeping otherwise idle hardware busy while relieving overloaded hardware of its burden. Prior work has demonstrated that work-stealing is very effective in practice. However, workstealing comes with a substantial overhead: as much as 2x to 12x slowdown over orthodox sequential code. In this paper we identify the key sources of overhead in work-stealing schedulers and present two significant refinements to their implementation. We evaluate our workstealing designs using a range of benchmarks, four different work-stealing implementations, including the popular fork-join framework, and a range of architectures. On these benchmarks, compared to orthodox sequential Java, our fastest design has an overhead of just 15%. By contrast, fork-join has a 2.3x overhead and the previous implementation of the system we use has an overhead of 4.1x. These results and our insight into the sources of overhead for workstealing implementations give further hope to an already promising technique for exploiting increasingly available hardware parallelism.
Original language | English |
---|---|
Pages (from-to) | 297-313 |
Number of pages | 17 |
Journal | ACM SIGPLAN Notices |
Volume | 47 |
Issue number | 10 |
DOIs | |
Publication status | Published - Oct 2012 |