TY - GEN
T1 - Deferred gratification
T2 - 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC 2011
AU - Jibaja, Ivan
AU - Blackburn, Stephen M.
AU - Haghighat, Mohammad R.
AU - McKinley, Kathryn S.
PY - 2011
Y1 - 2011
N2 - Implementing a new programming language system is a daunting task. A common trap is to punt on the design and engineering of exact garbage collection and instead opt for reference counting or conservative garbage collection (GC). For example, AppleScript#8482;, Perl, Python, and PHP implementers chose reference counting (RC) and Ruby chose conservative GC. Although easier to get working, reference counting has terrible performance and conservative GC is inflexible and performs poorly when allocation rates are high. However, high performance GC is central to performance for managed languages and only becoming more critical due to relatively lower memory bandwidth and higher memory latency of modern architectures. Unfortunately, retrofitting support for high performance collectors later is a formidable software engineering task due to their exact nature. Whether they realize it or not, implementers have three routes: (1) forge ahead with reference counting or conservative GC, and worry about the consequences later; (2) build the language on top of an existing managed runtime with exact GC, and tune the GC to scripting language workloads; or (3) engineer exact GC from the ground up and enjoy the correctness and performance benefits sooner rather than later. We explore this conundrum using PHP, the most popular server side scripting language. PHP implements reference counting, mirroring scripting languages before it. Because reference counting is incomplete, the implementors must (a) also implement tracing to detect cyclic garbage, or (b) prohibit cyclic data structures, or (c) never reclaim cyclic garbage. PHP chose (a), AppleScript chose (b), and Perl chose (c). We characterize the memory behavior of five typical PHP programs to determine whether their implementation choice was a good one in light of the growing demand for high performance PHP. The memory behavior of these PHP programs is similar to other managed languages, such as Java#8482; - -they allocate many short lived objects, a large variety of object sizes, and the average allocated object size is small. These characteristics suggest copying generational GC will attain high performance. Language implementers who are serious about correctness and performance need to understand deferred gratification: paying the software engineering cost of exact GC up front will deliver correctness and memory system performance later.
AB - Implementing a new programming language system is a daunting task. A common trap is to punt on the design and engineering of exact garbage collection and instead opt for reference counting or conservative garbage collection (GC). For example, AppleScript#8482;, Perl, Python, and PHP implementers chose reference counting (RC) and Ruby chose conservative GC. Although easier to get working, reference counting has terrible performance and conservative GC is inflexible and performs poorly when allocation rates are high. However, high performance GC is central to performance for managed languages and only becoming more critical due to relatively lower memory bandwidth and higher memory latency of modern architectures. Unfortunately, retrofitting support for high performance collectors later is a formidable software engineering task due to their exact nature. Whether they realize it or not, implementers have three routes: (1) forge ahead with reference counting or conservative GC, and worry about the consequences later; (2) build the language on top of an existing managed runtime with exact GC, and tune the GC to scripting language workloads; or (3) engineer exact GC from the ground up and enjoy the correctness and performance benefits sooner rather than later. We explore this conundrum using PHP, the most popular server side scripting language. PHP implements reference counting, mirroring scripting languages before it. Because reference counting is incomplete, the implementors must (a) also implement tracing to detect cyclic garbage, or (b) prohibit cyclic data structures, or (c) never reclaim cyclic garbage. PHP chose (a), AppleScript chose (b), and Perl chose (c). We characterize the memory behavior of five typical PHP programs to determine whether their implementation choice was a good one in light of the growing demand for high performance PHP. The memory behavior of these PHP programs is similar to other managed languages, such as Java#8482; - -they allocate many short lived objects, a large variety of object sizes, and the average allocated object size is small. These characteristics suggest copying generational GC will attain high performance. Language implementers who are serious about correctness and performance need to understand deferred gratification: paying the software engineering cost of exact GC up front will deliver correctness and memory system performance later.
KW - PHP
KW - garbage collection
KW - heap
KW - scripting languages
UR - http://www.scopus.com/inward/record.url?scp=79959943813&partnerID=8YFLogxK
U2 - 10.1145/1988915.1988930
DO - 10.1145/1988915.1988930
M3 - Conference contribution
SN - 9781450307949
T3 - Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC 2011
SP - 58
EP - 65
BT - Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC 2011
Y2 - 5 June 2011 through 5 June 2011
ER -