TY - GEN
T1 - An accurate prefetch technique for dynamic paging behaviour for software distributed shared memory
AU - Cai, Jie
AU - Strazdins, Peter E.
PY - 2012
Y1 - 2012
N2 - Page-based software Distributed Shared Memory (sDSM) systems suffer from their high memory consistency costs. Utilizing an effective prefetch technique can reduce this overhead. However, it is hard to predict accurately for applications exhibiting dynamic memory accessing and paging behavior. In this paper, we use Intel Cluster OpenMP (CLOMP) to study this problem. First, we present a stride augmented run-length encoding (sRLE) method to reconstruct series of numbers into 2D rectangles which facilitates a more accurate paging behavior analysis. Historical page miss records of OpenMP parallel and sequential regions are reconstructed and compressed by sRLE. Second, we design and implement a dynamic page prefetch technique (DReP) based on these reconstructed records to predict and issue prefetches. DReP and its implementation are evaluated through simulations and experiments. The simulation results show that DReP significantly improves the efficiency (∼34%) and coverage (∼47%) of existing prefetch techniques. Moreover, the experimental results show that DReP significantly reduces the memory consistency costs of CLOMP by 86% for extreme false sharing scenario. With the assistance of sRLE, DReP reduces ∼45% and ∼38% memory consistency costs for LINPACK and NPB-OMP benchmarks on GigE and DDR IB networks respectively. An detailed breakdown analysis shows that the introduced software overhead of DReP is negligible (∼2%).
AB - Page-based software Distributed Shared Memory (sDSM) systems suffer from their high memory consistency costs. Utilizing an effective prefetch technique can reduce this overhead. However, it is hard to predict accurately for applications exhibiting dynamic memory accessing and paging behavior. In this paper, we use Intel Cluster OpenMP (CLOMP) to study this problem. First, we present a stride augmented run-length encoding (sRLE) method to reconstruct series of numbers into 2D rectangles which facilitates a more accurate paging behavior analysis. Historical page miss records of OpenMP parallel and sequential regions are reconstructed and compressed by sRLE. Second, we design and implement a dynamic page prefetch technique (DReP) based on these reconstructed records to predict and issue prefetches. DReP and its implementation are evaluated through simulations and experiments. The simulation results show that DReP significantly improves the efficiency (∼34%) and coverage (∼47%) of existing prefetch techniques. Moreover, the experimental results show that DReP significantly reduces the memory consistency costs of CLOMP by 86% for extreme false sharing scenario. With the assistance of sRLE, DReP reduces ∼45% and ∼38% memory consistency costs for LINPACK and NPB-OMP benchmarks on GigE and DDR IB networks respectively. An detailed breakdown analysis shows that the introduced software overhead of DReP is negligible (∼2%).
KW - Dynamic Memory Pattern
KW - Prefetch
KW - Run-Length Encoding
KW - Software DSM
UR - http://www.scopus.com/inward/record.url?scp=84871121431&partnerID=8YFLogxK
U2 - 10.1109/ICPP.2012.16
DO - 10.1109/ICPP.2012.16
M3 - Conference contribution
SN - 9780769547961
T3 - Proceedings of the International Conference on Parallel Processing
SP - 209
EP - 218
BT - Proceedings - 41st International Conference on Parallel Processing, ICPP 2012
T2 - 41st International Conference on Parallel Processing, ICPP 2012
Y2 - 10 September 2012 through 13 September 2012
ER -