STT-RAM cache hierarchy with multiretention MTJ designs

TitleSTT-RAM cache hierarchy with multiretention MTJ designs
Publication TypeJournal Article
Year of Publication2014
AuthorsZ Sun, X Bi, H Li, WF Wong, and X Zhu
JournalIeee Transactions on Very Large Scale Integration (Vlsi) Systems
Volume22
Start Page1281
Issue6
Pagination1281 - 1293
Date Published01/2014
Abstract

Spin-transfer torque random access memory (STT-RAM) is the most promising candidate to be universal memory due to its good scalability, zero standby power, and radiation hardness. Having a cell area only 1/9 to 1/3 that of SRAM, allows for a much larger cache with the same die footprint. Such reduction of cell size can significantly shrink the cache array size, leading to significant improvement of overall system performance and power consumption, especially in this multicore era where locality is crucial. However, deploying STT-RAM technology in L1 caches is challenging because write operations on STT-RAM are slow and power-consuming. In this paper, we propose a range of cache hierarchy designs implemented entirely using STT-RAM that deliver optimal power saving and performance. In particular, our designs use STT-RAM cells with various data retention times and write performances, made possible by novel magnetic tunneling junction designs. For L1 caches where speed is of utmost importance, we propose a scheme that uses fast STT-RAM cells with reduced data retention time coupled with a dynamic refresh scheme. In the dynamic refresh scheme, another emerging technology, memristor, is used as the counter to monitor the data retention of the low-retention STT-RAM, achieving a higher array area efficiency than an SRAM-based counter. For lower level caches with relatively larger cache capacities, we propose a design that has partitions of different retention characteristics, and a data migration scheme that moves data between these partitions. The experiments show that on the average, our proposed multiretention level STT-RAM cache reduces total energy by as much as 30%-74.2% compared to previous single retention level STT-RAM caches, while improving instruction per cycle performance for both two-level and three-level cache hierarchies. © 2013 IEEE.

DOI10.1109/TVLSI.2013.2267754
Short TitleIeee Transactions on Very Large Scale Integration (Vlsi) Systems