Using ECC DRAM to Adaptively Increase Memory Capacity
Modern DRAM modules are often equipped with hardware error correction capabilities, especially for DRAM deployed in large-scale data centers, as process technology scaling has increased the susceptibility of these devices to errors. To provide fast error detection and correction, error-correcting codes (ECC) are placed on an additional DRAM chip in a DRAM module. This additional chip expands the raw capacity of a DRAM module by 12.5 are unable to use any of this extra capacity, as it is used exclusively to provide reliability for all data. In reality, there are a number of applications that do not need such strong reliability for all their data regions (e.g., some user batch jobs executing on a public cloud), and can instead benefit from using additional DRAM capacity to store extra data. Our goal in this work is to provide the additional capacity within an ECC DRAM module to applications when they do not need the high reliability of error correction. In this paper, we propose Capacity- and Reliability-Adaptive Memory (CREAM), a hardware mechanism that adapts error correcting DRAM modules to offer multiple levels of error protection, and provides the capacity saved from using weaker protection to applications. For regions of memory that do not require strong error correction, we either provide no ECC protection or provide error detection using multibit parity. We evaluate several layouts for arranging the data within ECC DRAM in these reduced-protection modes, taking into account the various trade-offs exposed from exploiting the extra chip. Our experiments show that the increased capacity provided by CREAM improves performance by 23.0 a memory caching workload, and by 37.3 executing production query traces. In addition, CREAM can increase bank-level parallelism within DRAM, offering further performance improvements.
READ FULL TEXT