Combined circuit and architectural level variable supply-voltage scaling for low power

Abstract

Energy-efficient processor design is becoming more and more important with technology scaling and with high performance requirements. Supply-voltage scaling is an efficient way to reduce energy by lowering the operating voltage and the clock frequency of processor simultaneously. We propose a variable supply-voltage (VSV) scaling technique based on the following key observation: upon an L2 miss, the pipeline performs some independent computations but almost always ends up stalling and waiting for data, despite out-of-order issue and other latency-hiding techniques. Therefore, during an L2 miss we scale down the supply voltage of certain sections of the processor in order to reduce power dissipation while it carries on the independent computations at a lower speed. However, operating at a lower speed may degrade performance, if there are sufficient independent computations to overlap with the L2 miss. Similarly, returning to high speed may degrade power savings, if there are multiple outstanding misses and insufficient independent computations to overlap with them. To avoid these problems, we introduce two state machines that track parallelism on-the-fly, and we scale the supply voltage depending on the level of parallelism. We also consider circuit-level complexity concerns which limit VSV to two supply voltages, stability and signal-propagation speed issues which limit how fast VSV may transition between the voltages, and energy overhead factors which disallow supply-voltage scaling of large RAM structures such as caches and register file. Our simulations show that VSV achieves an average of 7.7% total processor power reduction with 0.9% performance degradation in an eight-way, out-of-order-issue processor that implements deterministic clock gating and software prefetching, across all the SPEC2K benchmarks. For those benchmarks that have high L2 miss rates (more than 4 misses per 1000 instructions), VSV achieves 23.0% reduction in total processor power with 2.0% performance degradation on average. © 2005 IEEE.

DOI
10.1109/TVLSI.2005.844295
Year