Using managed runtime systems to tolerate holes in wearable memories

Tiejun Gao, Karin Strauss, Stephen M. Blackburn, Kathryn S. McKinley, Doug Burger, James Larus

    Research output: Contribution to journalArticlepeer-review

    6 Citations (Scopus)

    Abstract

    New memory technologies, such as phase-change memory (PCM), promise denser and cheaper main memory, and are expected to displace DRAM. However, many of them experience permanent failures far more quickly than DRAM. DRAM mechanisms that handle permanent failures rely on very low failure rates and, if directly applied to PCM, are extremely inefficient: Discarding a page when the first line fails wastes 98% of the memory. This paper proposes low complexity cooperative software and hardware that handle failure rates as high as 50%. Our approach makes error handling transparent to the application by using the memory abstraction offered by managed languages. Once hardware error correction for a memory line is exhausted, rather than discarding the entire page, the hardware communicates the failed line to a failure-aware OS and runtime. The runtime ensures memory allocations never use failed lines and moves data when lines fail during program execution. This paper describes minimal extensions to an Immix mark-region garbage collector, which correctly utilizes pages with failed physical lines by skipping over failures. This paper also proposes hardware support that clusters failed lines at one end of a memory region to reduce fragmentation and improve performance under failures. Contrary to accepted hardware wisdom that advocates for wear-leveling, we show that with software support non-uniform failures delay the impact of memory failure. Together, these mechanisms incur no performance overhead when there are no failures and at failure levels of 10% to 50% suffer only an average overhead of 4% and 12%, respectively. These results indicate that hardware and software cooperation can greatly extend the life of wearable memories.

    Original languageEnglish
    Pages (from-to)297-308
    Number of pages12
    JournalACM SIGPLAN Notices
    Volume48
    Issue number6
    DOIs
    Publication statusPublished - Jun 2013

    Fingerprint

    Dive into the research topics of 'Using managed runtime systems to tolerate holes in wearable memories'. Together they form a unique fingerprint.

    Cite this