







### Putting computing on a strict diet with energy-proportionality

Alex Yakovlev, microSystems, School of EEE, Newcastle University async.org.uk www.ncl.ac.uk/eee/research/groups/micro/

XXIX DCIS, Madrid 26<sup>th</sup> November 2014





The more you get The more you give!

## Outline

#### Resource-driven computing

– What is Energy-modulated computing?

#### Understanding Real-Power

– What is a computational load for energy source? What is energy effort? What is computational output?

#### Power-proportionality

 How to compute from a wide range of power levels? How to optimize computational activity for energy budget? What is power-adaptation?

#### Designing for power-proportionality

 How to design for multiple modes in one core? How to achieve graceful degradation for power levels? What models can be used for multimodal and reconfigurable microarchitectures?

#### Looking into the future

# Resource-driven Computing









### Interplay

Working conditions:

**Energy-constrained systems** 

 Solar energy, e-beam power supply, small batteries, ...

Unreliable power supply

 Voltage fluctuations, low battery, ...

#### **Hostile environments**

 High/low temperatures, noise,













## **Power/Energy modulation**

- The principle of **power/energy-modulated computing** is that the flow of energy entering a computing system determines its computational flow
- It is fundamental for building future real-power systems, particularly systems for **survivability**
- Any piece of electronics becomes active and performs to a certain level of its delivered quality in response to some level of energy and power
- A quantum of energy when applied to a computational device can **be converted** into a corresponding **amount of computation** activity
- Depending on their design and implementation systems can produce meaningful activity at different power levels
- As **power levels become uncertain** we cannot always guarantee • completely certain computational activity 5

### Traditional vs energy-modulated view



### **Piezo-Film Experiment**



### **Experiment Results**



### Example: Walls Alive (condition monitoring)

• Energy field (thermal, mechanical vibration, etc)



high

### Example: condition monitoring

Network of sensors for spaced and temporal energy mapping



### Example: condition monitoring

Sensor node structure



System architecture



# Power efficiency and regularity

- Modern systems rely on highly regular (periodic) power sources they "invest" some power into power regulation
- Future systems will have to operate in a wide dynamic range, paying the price in efficiency in a particular band



### **Towards Real-Power systems**

#### • How to design Real Power systems?

- Firstly, understanding real-power
- Secondly, developing design principles for real-power
- Understanding Real Power:
  - What is computational load? Is it a resistor or distributed capacitor? A bit of both? What is the dynamics of the process of discharging a capacitor by a switching circuit?
  - What is energy effort at the logic level? How much computation action does a circuit perform for a given amount of energy?

# Understanding Real-Power Systems





Ctesibius' Clepsydra , 3<sup>rd</sup> century BC



Faraday's homopolar motor, 1821

### What is Computational Load?



- We employ a simple ringoscillator to serve as a digital circuit load.
- It is due to the fact that ringoscillator can closely mimic the switching behaviour of many closed loop self-timed circuits.



### **Circuit model**





## **Circuit Model: switching process**



# **Solution for Super-threshold**

A valid assumption: in super-threshold region we can assume that the propagation delay is inversely proportional to the voltage, so we have:

| Switching index | $V_N$   | $t_s = \frac{A}{V}$ | Physical<br>time (t)                            |
|-----------------|---------|---------------------|-------------------------------------------------|
| 0               | $K^{0}$ | $\frac{A}{K^{0}}$   | $\frac{A}{K^{0}}$                               |
| 1               | $K^1$   | $\frac{A}{K^{1}}$   | $\frac{A}{K^0} + \frac{A}{K^1}$                 |
| 2               | $K^{2}$ | $\frac{A}{K^2}$     | $\frac{A}{K^0} + \frac{A}{K^1} + \frac{A}{K^2}$ |
|                 |         |                     |                                                 |
| n               | $K^{n}$ | $\frac{A}{K^n}$     | $\sum_{i=0}^{n} \frac{A}{K^{i}}$                |

### **Solution for Super-threshold**

$$V_N = \frac{AK}{t(1-K) + AK}$$

#### Hyperbolic function of time



### More accurate solution for Superthreshold

A general model of gate delay propagation [1] is used:

$$t_{p} = \begin{cases} t_{p1} = \frac{pc_{l}V}{(V - V_{TH})^{\alpha}} \\ t_{p2} = \frac{pc_{l}V}{\frac{V - V_{TH}}{N_{s}}} \\ I_{0}e^{\frac{V - V_{TH}}{N_{s}}} \end{cases}$$

1 0.9 for A=10-11 8.0 0.7 Normalalsed VCC 0.6 0.5 0.4 0.3 simulation result for 5-stage inverters 0.2 0.1 0 0.0E+0 2.0E-8 8.0E-8 1.0E-7 4.0E-8 6.0E-8 physical time, t(s)

Assuming  $\alpha = 1.3$ 



[1] "Sub-threshold Design for Ultra Low Power Systems", A.Wang, B. H. Calhoun, A. P. Chandrakasan

### Analysis of hyperbolic decays

• For super-threshold and  $\alpha$ =2:

$$V_N = \frac{AK}{(1-K)t + AK} = \frac{1}{at+1}$$
 where  $A = 2pC_p$  and  $a = \frac{1-K}{AK}$ 

• For arbitrary α:

$$V_N = \frac{1}{(at+1)^{1/(\alpha-1)}}$$
 and  $a = \frac{1-K^{\alpha-1}}{AK^{\alpha-1}}$ 

• Differential equation:

$$\frac{dV}{dt} = -aV^{\alpha}$$

The oscillator is a voltage (and time)-varying resistor:

$$R(V) = \frac{2p}{V} \text{ or } R(t) = 2p(at+1)$$

### Analysis of hyperbolic decay rates

• For stack:

$$\left(\frac{1}{a}\right)^{\frac{1}{\alpha}} = \left(\frac{1}{a_1}\right)^{\frac{1}{\alpha}} + \left(\frac{1}{a_2}\right)^{\frac{1}{\alpha}} \text{ or } \\ a = \frac{a_1 a_2}{\left(\frac{\alpha}{\sqrt{a_1}} + \frac{\alpha}{\sqrt{a_2}}\right)^{\alpha}}$$

- For parallel:  $a = a_1 + a_2$
- Confirmed by physical experiments with discrete CMOS components:
  - The value of alpha is 1.5
  - The discharging process for a stack of two identical circuits is nearly 3 times slower than for a standalone circuit

Series (stack) and parallel configurations:



### Capacitor discharge experiments



#### Standalone











### **Reference-free voltage sensor**



## **Reference-free voltage sensing**

• Voltage sensor requiring only timing reference



Apparatus and method for voltage sensing, Newcastle University, GB Patent Number 2479156, 30 March 2010.

# Output count and energy consumption



# **Power Proportionality**





#### ↑ Power consumption



### **Power Proportionality**



#### Issues reported in literature:

•Performance-power tradeoff for commodity systems is linear; the best strategy is "Race to sleep"; additional "run" power states are of little use; changes in existing commodity operating systems have little influence

### •The focus should be on the time to transition to and from sleep!

•For a new type of systems such as WSN there is a non-linear region – the slogan is: learn how to run CMOS slowly and exploit scheduling optimizations

> Source: S. Dawson-Haggerty et al. Power Optimization – Reality Check, UC Berkeley, 2009

### Power proportionality ("knee-stretching")

# Service-modulated processing

# Energy-modulated processing



### Multiplier: Quality of Service vs Power



### From power-proportional to poweradaptive



### Relationship with uncertainty (e.g. timing variability)



# Towards designing power-adaptive systems

- Truly energy-modulated design must be poweradaptive
- Systems that are power adaptive are more resourceful and more resilient
- Power-adaptive systems can work in a broad range of power levels
- How to design such systems?

### Grand-prix race with a fuel limit



The goal: Given a finite amount of fuel, maximize the total number of laps made all the cars on the circuit. Unknown parameters: What is the optimum engine power? What is the optimum number of cars on the circuit?



#### **Experiment**:

- *a*. A ring micropipeline with 5 stages is used in the experiment.
- **b**. Simulation Results are obtained with different parallelism (1, 2, 3, 4 tokens), in different working voltages (1.0V, 0.8V, 0.6V, 0.4V, 0.35V, 0.25V, 0.2V, 0.16V), and under different amount of energy (600pJ, 700pJ, 800pJ).
- c. A run stops when the energy is fully consumed.
- *d*. The amount of computation is counted for each run.
- e. A unit of computation is defined as one pulse generated in the pipeline.

#### Ring pipeline with a given energy budget



#### **Conclusions:**

- The higher the concurrency the greater the amount of computation and the smaller the amount of leakage.
- At sub-threshold voltages, the amount of computation is STRONGLY affected by degree of concurrency, due to the effect of leakage.
- Above threshold, the amount of computation that is practically insensitive to the degree of concurrency.

# Synchronous vs Self-Timed Design (in terms of energy efficiency)



Asynchronous (selftimed) logic can provide completion detection and thus reduce the interval of leakage to minimum, thereby doing nothing well!

Source: Akgun et al, ASYNC'10

#### Closer look at AC-powered self-timed logic



#### **Circuit-level: speed-independent SRAM**

Mismatch between delay lines and SRAM memories when reducing Vdd



For example, under 1V Vdd, the delay of SRAM reading is equal to 50 inverters and under 190mV, the delay is equal to 158 inverters

- The problem has been well known so far
- Existing solutions:
  - Different delay lines in different range of Vdd
  - Duplicating a column of SRAM to be a delay line to bundle the whole SRAM
- The solutions require:
  - voltage references
  - DC-DC adaptor
- Completion detection needed?!

#### **Circuit-level: speed-independent SRAM**



(synthesized from a Petri net specification)

#### **Circuit-level:** speed-independent SRAM



Self-timed SRAM chip: UMC CMOS 90nm



Low Vdd – slow response High Vdd – fast response

#### **SRAM testing and results**

- SRAM operations modulated by Vdd from a Capacitor Bank
- When Vdd goes below 0.75v, the ack signal is not generated by SRAM
- The circuit automatically wakes up when Vdd goes up



### Power Proportional design using Formal Models



#### Achieving Power Proportionality

- Support for wide range of voltages
  - Asynchronous design
  - Unstable voltage supply (energy harvesting)
- Components optimised for different modes
  - Survival mode (power)
  - Mission mode (energy efficiency)
  - Emergency mode (performance)
- Reconfigurable instructions
  - Altering instruction behaviour in runtime

# Performance

**Diversification** ...

Califo

diver

Fault Tolerance



- ✓ Multimodal operations
- ✓ Adjustable delay lines
- ✓ Fault tolerance

diversification

App. #2

App.#3

Application #N

App.#1

Funct

#### **Microprocessor architecture**



#### **Conceptual view of the design process**





#### **WORKCRAFT FRAMEWORK**

#### 🕌 H:\New Folder\instructions\8051\ALL\_new\_new\_ASAP.xwd (Conditional Partial Order Graph) - WorkCraft

\_ 8 ×



















#### Intel 8051 Datapath...



- ✓ Fully asynchronous implementation (bundled data protocol)
  - adjustable delay lines
- ✓ Fault tolerance operation
- ✓ DFT integration

#### Adaptable datapath unit



#### Some measurements...

- 0.89V to 1.5V: full capability mode.
- 0.74V to 0.89V: at 0.89V the RAM starts to fail, so the chip operates using
- 0.22V to 0.74V: at 0.74V the program counter starts to fail, however the control logic synthesised using the CPOG model continues to operate correctly down to 0.22V
- 67 MIPS at 1.2 V.
- ~2700 instructions per second at 0.25V.

## Future of Real-Power Design



## **TODAY**Synthesis of asynchronous control DC-DC converters (A4A project)



#### Asynchronous control for Bucks



#### Asynchronous control for Bucks



(a) Complex gate asynchronous implementation



### **TODAY** Energy harvesting systems

#### Using Holistic project vision:



Sporadic source of energy does not allow for fancy power processing and therefore large storage

#### Staying alive in variable, intermittent, lowpower environments (Savvie Project)



#### Our View on EHA Systems



#### Asynchronous Control for Capacitor Banks



#### Asynchronous Controller: Signal Transition Graph Spec



#### **Power-modulation and uncertainty**

- Localised prediction, from every moment at present
- Power has a certain profile (time trajectory) in the past and uncertain future
- Power-proportional and power-adaptive systems ...



#### Power-modulated multi-layer system

 Multiple layers of the system design can turn on at different power levels (analogies with living organisms' nervous systems or underwater life, layers of different cost labour in resilient economies)

TOMORROW

- As power goes higher new layers turn on, while the lower layers ("back up") remain active
- The more active layers the system has the more power resourceful it is



#### Acknowledgements

- My colleagues at Newcastle and outside: Andrey Mokhov, Danil Sokolov, Fei Xia, Delong Shang, Maxim Rykunov, Reza Ramezani, Alex Kushnerov (Ben Gurion Uni), Bernard Stark (Bristol Uni) and many others – see <u>http://async.org.uk</u>
- EPSRC support: Dream Fellowship, projects: Holistic, PRiME, Power-Prop, Savvie, A4A
- Industrial support: ARM CASE studentship, Dialog Semiconductor

#### **THANK YOU!**