TY - GEN
T1 - Computer performance microscopy with SHIM
AU - Yang, Xi
AU - Blackburn, Stephen M.
AU - McKinley, Kathryn S.
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/6/13
Y1 - 2015/6/13
N2 - Developers and architects spend a lot of time trying to understand and eliminate performance problems. Unfortunately, the root causes of many problems occur at a fine granularity that existing continuous profiling and direct measurement approaches cannot observe. This paper presents the design and implementation of Shim, a continuous profiler that samples at resolutions as fine as 15 cycles; three to five orders of magnitude finer than current continuous profilers. Shim's fine-grain measurements reveal new behaviors, such as variations in instructions per cycle (IPC) within the execution of a single function. A Shim observer thread executes and samples autonomously on unutilized hardware. To sample, it reads hardware performance counters and memory locations that store software state. Shim improves its accuracy by automatically detecting and discarding samples affected by measurement skew. We measure Shim's observer effects and show how to analyze them. When on a separate core, Shim can continuously observe one software signal with a 2% overhead at a ∼1200 cycle resolution. At an overhead of 61%, Shim samples one software signal on the same core with SMT at a ∼15 cycle resolution. Modest hardware changes could significantly reduce overheads and add greater analytical capability to Shim. We vary prefetching and DVFS policies in case studies that show the diagnostic power of fine-grain IPC and memory bandwidth results. By repurposing existing hardware, we deliver a practical tool for fine-grain performance microscopy for developers and architects.
AB - Developers and architects spend a lot of time trying to understand and eliminate performance problems. Unfortunately, the root causes of many problems occur at a fine granularity that existing continuous profiling and direct measurement approaches cannot observe. This paper presents the design and implementation of Shim, a continuous profiler that samples at resolutions as fine as 15 cycles; three to five orders of magnitude finer than current continuous profilers. Shim's fine-grain measurements reveal new behaviors, such as variations in instructions per cycle (IPC) within the execution of a single function. A Shim observer thread executes and samples autonomously on unutilized hardware. To sample, it reads hardware performance counters and memory locations that store software state. Shim improves its accuracy by automatically detecting and discarding samples affected by measurement skew. We measure Shim's observer effects and show how to analyze them. When on a separate core, Shim can continuously observe one software signal with a 2% overhead at a ∼1200 cycle resolution. At an overhead of 61%, Shim samples one software signal on the same core with SMT at a ∼15 cycle resolution. Modest hardware changes could significantly reduce overheads and add greater analytical capability to Shim. We vary prefetching and DVFS policies in case studies that show the diagnostic power of fine-grain IPC and memory bandwidth results. By repurposing existing hardware, we deliver a practical tool for fine-grain performance microscopy for developers and architects.
UR - http://www.scopus.com/inward/record.url?scp=84960084788&partnerID=8YFLogxK
U2 - 10.1145/2749469.2750401
DO - 10.1145/2749469.2750401
M3 - Conference contribution
T3 - Proceedings - International Symposium on Computer Architecture
SP - 170
EP - 184
BT - ISCA 2015 - 42nd Annual International Symposium on Computer Architecture, Conference Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 42nd Annual International Symposium on Computer Architecture, ISCA 2015
Y2 - 13 June 2015 through 17 June 2015
ER -