TY - JOUR
T1 - Wake up and smell the coffee
T2 - Evaluation methodology for the 21st century
AU - Blackburn, Stephen M.
AU - McKinley, Kathryn S.
AU - Garner, Robin
AU - Hoffmann, Chris
AU - Khan, Asjad M.
AU - Bentzur, Rotem
AU - Diwan, Amer
AU - Feinberg, Daniel
AU - Frampton, Daniel
AU - Guyer, Samuel Z.
AU - Hirzel, Martin
AU - Hosking, Antony
AU - Jump, Maria
AU - Lee, Han
AU - Moss, J. Eliot B.
AU - Phansalkar, Aashish
AU - Stefanović, Darko
AU - VanDrunen, Thomas
AU - Von Dincklage, Daniel
AU - Wiedermann, Ben
PY - 2008/8/1
Y1 - 2008/8/1
N2 - Evaluation methodology underpins all innovation in experimental computer science. It requires relevant workloads, appropriate experimental design, and rigorous analysis. Unfortunately, methodology is not keeping pace with the changes in our field. The rise of managed languages such as Java, C#, and Ruby in the past decade and the imminent rise of commodity multicore architectures for the next decade pose new methodological challenges that are not yet widely understood. This paper explores the consequences of our collective inattention to methodology on innovation, makes recommendations for addressing this problem in one domain, and provides guidelines for other domains. We describe benchmark suite design, experimental design, and analysis for evaluating Java applications. For example, we introduce new criteria for measuring and selecting diverse applications for a benchmark suite. We show that the complexity and nondeterminism of the Java runtime system make experimental design a first-order consideration, and we recommend mechanisms for addressing complexity and nondeterminism. Drawing on these results, we suggest how to adapt methodology more broadly. To continue to deliver innovations, our field needs to significantly increase participation in and funding for developing sound methodological foundations.
AB - Evaluation methodology underpins all innovation in experimental computer science. It requires relevant workloads, appropriate experimental design, and rigorous analysis. Unfortunately, methodology is not keeping pace with the changes in our field. The rise of managed languages such as Java, C#, and Ruby in the past decade and the imminent rise of commodity multicore architectures for the next decade pose new methodological challenges that are not yet widely understood. This paper explores the consequences of our collective inattention to methodology on innovation, makes recommendations for addressing this problem in one domain, and provides guidelines for other domains. We describe benchmark suite design, experimental design, and analysis for evaluating Java applications. For example, we introduce new criteria for measuring and selecting diverse applications for a benchmark suite. We show that the complexity and nondeterminism of the Java runtime system make experimental design a first-order consideration, and we recommend mechanisms for addressing complexity and nondeterminism. Drawing on these results, we suggest how to adapt methodology more broadly. To continue to deliver innovations, our field needs to significantly increase participation in and funding for developing sound methodological foundations.
UR - http://www.scopus.com/inward/record.url?scp=49249108501&partnerID=8YFLogxK
U2 - 10.1145/1378704.1378723
DO - 10.1145/1378704.1378723
M3 - Article
SN - 0001-0782
VL - 51
SP - 83
EP - 89
JO - Communications of the ACM
JF - Communications of the ACM
IS - 8
ER -