DMR3D: Dynamic Memory Relocation in 3D Multicore Systems

Developed by Dean M. Ancajas, Dr. Koushik Chakraborty, and Dr. Sanghamitra Roy of Utah State University’s Electrical and Computer Engineering Department


Technical Summary

A key factor limiting computer system performance stems from the increasing gap between the processor and memory speed. Recent fabrication techniques, such as 3D die stacking, seek to solve this problem by placing the memory physically closer to the processor. Several previous works have reported huge performance speedups with 3D stacking in several traditional 2D systems, including the benefit of placing memory directly on top of the processor. Many of these studies however considered systems with a centralized Memory Controller (MC).

Suboptimal data placement, a new problem that arises in 3D systems with multiple MCs, precludes the performance brought about by 3D die stacking. In a 3D system with distributed MCs, the placement of data in a memory bank and the location of the processing core determines if that data can be accessed through the Through Silicon Via (TSV) or if it must traverse the interconnect. For example, in a 3D Multicore, access latency can be ten times less if data is directly accessed through the TSV rather than through the interconnect. However, current system designs ignore this critical consideration, thereby substantially undermining potential performance and energy efficiency in a 3D Multicore.

Patented DMR3D provides hardware mechanisms that alleviate the interconnect problem by remapping data to an optimal location in a 3D Multicore System. The hardware mechanisms implement two DMR3D algorithms: a Global Scheme that dynamically allocates migration slots to different threads and a Thread On-Demand Scheme that statically allocates equal migration slots to individual threads.


Competitive Advantages

A thorough evaluation of the two DMR3D schemes using a state-of-the-art, full-system simulator shows a performance increase of 7 to 72 percent (average of 30 percent), an increase in local access by 9 to 95 percent (average  of 50 percent) and an improved communication energy by up to 48 percent (average of 25 percent) compared to the baseline. A comparison with a representative scheme for NUCA caches (Victim Replication) also demonstrates an average of 33 percent improved performance.


Commercial Applications

•  3D multicore systems

•  Anything involving memory



•  U.S. Patent 9,063,667 – Dynamic Memory Reallocation

•  Ancajas, D.M.; Chakraborty, K.; Roy, S., "DMR3D: Dynamic Memory Relocation in 3D Multicore Systems," in Design Automation Conference (DAC), 2013 50th ACM/EDAC/IEEE , vol., no., pp.1-9, May 29 2013-June 7 2013



Patent Information:
Computer Science
For Information, Contact:
Christian Iverson
Utah State University
Sanghamitra Roy Koushik Chakraborty