

### Major hardware trends affecting Exascale developments and their potential impact on software

Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of Microelectronics Research University of Bristol, UK





# KA brief biography



Graduated as Valedictorian in **Computer Science** from Cardiff University



**1994** Joined **Inmos** to work for David May as a microprocessor architect



**1999** Moved to **Pixelfusion** – a high-tech start-up designing the first many-core general purpose graphics processor (<u>GPGPU</u>)

ClearSpeed

2002 Co-founded ClearSpeed as Director of Architecture and Applications



RISTOL

**2009** Head of Microelectronics Research at the **University of Bristol**, focusing on HPC and computer architecture. <u>FP7 EESI</u> member, <u>PRACE</u> prototype panelist, <u>Archer</u> UK national supercomputer project group **White States of** 

### Microelectronics Research in Bristol







Simon McIntosh-Smith Head of Group

Prof David May

Prof Dhiraj Pradhan









Dr Jose Nunez-Yanez

Dr Kerstin Eder

Dr Simon Hollis

Dr Dinesh Pamunuwa

7 tenured staff, 6 research assistants, 16 PhD students University of BRISTOL

# **K** Bristol Research

Energy Aware COmputing (EACO):

- Multi-core and many-core computer architectures (FP7 EESI)
  - ClearSpeed, XMOS, Inmos, Pixelfusion, ...
- Algorithms for *heterogeneous architectures* 
  - CPUs+GPUs, OpenCL
- Electronic and Optical Network on Chip (NoC)
- Fault tolerant design (hardware and software)
  - Near threshold computing for embedded medical devices (FP7 DeSyRe)
- Reconfigurable architectures (FPGA)
- Design verification (formal and simulation-based), formal specification and analysis
- Silicon process variation
- Design methodologies, modelling & simulation of MNT based structures and systems





### Context





### Ke Moore's Law today

2009 ITRS - Functions/chip and Chip Size



http://www.itrs.net/Links/2009ITRS/2009Chapters 2009Tables/2009 ExecSum.pdf

### Ke Moore's Law today

2009 ITRS - Functions/chip and Chip Size



## Important technology trends





Herb Sutter, "The free lunch is over", Dr. Dobb's Journal, 30(3), March 2005. On-line version, August 2009. <u>http://www.gotw.ca/publications/concurrency-ddj.htm</u>

### Nextgov

#### TECHNOLOGY AND THE BUSINESS OF GOVERNMENT



twide

ontract

Force

NEWS

⊠ EMAIL ⊖ PRINT → SHAR

COMME

From AB

during m Pentago

IN RESPON DIFFEREN FORCE GE MORTALS

From Jac

the abov Athos, N

in respor

"There's a lot of government money chasing new switches," to replace CMOS-engineered silicon, Colwell said. But none of the alternatives looks promising so far and there's likely to be a "fallow period" while computing power simply rests at is outside limit.

"The point is," Colwell added, "the single best exponential

technology curve mankind has ever seen is what we just lived

through the last 40 or 50 years and it's going to end real soon."

But, "in a perverse sort of way, it may mean there's a new flowering of computer architectures," Colwell said, as electronics companies come out with specialized architectures for different fields and tasks.

FACEBOO http://techinsider.nextgov.com/2012/02/darpa official computing speed may hit a temporary limit.php tion From Jar "Exceller When I make that statement, a lot of people say 'yeah, yeah, a lot of BY ALIYA STERNSTEIN 03/30/12 03:25 pm ET keep reminding us when people have always predicted Moore's Law will always end and it never the ...." HEALTH IT UPDATE has yet so let's move on to something else," Colwell said. "And that's in response to MEMORIAL Sustainable Health Data Exchange true. People have said that forever and they have not been right yet. DEATH MARCH CHALLENGE BY JOHN PULLEY 04/03/12 01:18 pm ET Unfortunately, physics being what it is, someone will eventually have to FOR DISA BLOGS HOMEPAGE be right."



### Major hardware trends





### K The five major hardware trends that will affect exascale software

- 1. Changes to memory hierarchies
- 2. The impact of fault tolerance
- 3. Focus on energy efficiency
- 4. Heterogeneity
- 5. Scale





### Changes to memory hierarchies





# K Changes to memory hierarchies

### Causes:

- 3D chip stacking
- Photonic interconnects
- Continued deepening of memory hierarchies





Eurocloud FP7 project, www.eurocloudserver.com

14

### ✓ 3D stacked memories

 Vertically stack many-core processors with DRAM → greater bandwidth and greater energy efficiency





Samsung 3D DRAM

# Photonic networks

- Roadmaps to achieve ExaFLOPs (10<sup>18</sup>) by 2018 are relying on some major hardware breakthroughs to improve energy efficiency
- Prof Jeremy O'Brien, Centre for Quantum Photonics, University of Bristol, also Prof Keren Bergmen's work at Columbia
- Moving data becoming an increasingly dominant fraction of energy dissipation in microelectronics

"Compute free, bandwidth expensive"







Bergmen, SIAM PP10, Feb 2010

# Contically interconnected supercomputing board

**CMPs 3DI Stack** 

Silicon Photonic Interconnection Network

**Memory Stack** 

**CMPs** 

Supercomputing board with CMPs 3DI stack and DRAM

DRAM

10 teraflops per chip 64 CMPs per chip 3DI stack with CMPs, memory, and photonic NoC

Bisectional data rate on-chip: 10 TB/s Bisectional data rate off-chip: 10 TB/s University of BRISTOI Bergmen, SIAN

**Potential disruption!** 

Bergmen, SIAM PP10, Feb 2010

### Contical computing in Bristol



Splitter



Spiral delay line



http://www.phy.bris.ac.uk/groups/cqp/index.html





# **K**Silicon photonic components

#### 2x2 splitter



#### 1x4 dynamic router

#### Wavelength filtering



#### Si waveguide



#### 16 mode coupler





#### 1x4 splitter





### **4 Port Optical Router**



- Switch and route light to different parts of the optical network
- Phase controllers set where the light goes

http://www.phy.bris.ac.uk/groups/cqp/index.html











# The impact of fault tolerance





### K The impact of fault tolerance

### Causes:

- Shrinking feature sizes
- Near-threshold operating voltages



# **K** Implications

- I believe high-end systems vendors will be able to provide reliable Exascale hardware, <u>at a price</u>
  - Significant energy and silicon cost
- Fault tolerant software (applications, middleware, OS, ...) will open up other possibilities
- Great need for tools and techniques to make software developers "fault aware"



### **Energy Efficiency Changes**





# Ke Focus on energy efficiency

Hardware energy efficiency improvements alone won't be enough.

Significant energy efficiency improvements could come from software optimisations:

- Tradeoff data movement vs. computation
- Mixed precision
- Asynchronous/event driven vs. polling
- And many, many more...

### EESI-2 FP7 project to include EEHPC focus



### Keine Electronics Weekly 15-21 June 2011

#### LOW POWER Lack of software support marks the low power scorecard at DAC

[...] if the software keeps cores active for no good reason [the hardware] won't deliver a realised saving.

> the feature v has put ow-power into its recently high-k, metal gate (P FinFETs should bring

some of

rocess.

lable. v op-

os cores

e lower

li Singer,

lim-

ated low-

bessed

With limited software support, dedicated lowpower circuitry could save maybe 20%



Intel waits for better low-power software control

power circuitry could save maybe 20% in a typical multimediaoriented core. Make the software controlling it. better at controlling the power states and that difference could be three to five times.

During an afternoon panel discus-

sion Ambrose Low, director of design engineering at Broadcom said: "We have hundreds of knobs in the hardware to turn power down.

"The question is whether we can take the actual use-cases into consideration and optimise the software to

Make the software better at controlling the power states and that difference could be three to five times.

eners to see how much er they can save," he added.

Chris Edwards writes the Low-Power Design Blog (enabled by Mentor Graphics) on ElectronicsWeekly.com

www.electronicsweekly.com/ew-blogs/



### Ke Optimising use of multiple cores



- Assign less demanding tasks to low frequency cores
- Move tasks that interfere with other optimisation efforts onto a different core
- Example: ARM's big.LITTLE SoC design strategy
- We have to deal with the timing & communication implications of doing this.



"Energy Efficient HPC Software: Learning from Embedded Systems Development", K. Eder, S. Kerrison and S. McIntosh-Smith, SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP12), Savanah, Georgia, February 2012 http://www.cs.bris.ac.uk/~simonm/publications/sk\_siampp12\_eehpcs.pdf





### **Heterogeneity Changes**





### Ke Heterogeneity

Causes:

- Multiple types of core
- Interconnect
- Memory type, capacity, ...
- Software (OS, middleware, tools, ...)



### **K** Heterogeneous Systems



#### AMD Llano Fusion APUs





FP7 Mont Blanc ARM + GPU



NVIDIA Tegra, Project Denver

# **K** Implications

- New programming languages, models, ...
- Dynamically adaptive software
  - Discover resources at run-time
- Auto-tuning
- Application frameworks





### **Scale Changes**







### From hundreds to tens of thousands of cores on-chip



AMD Interlagos 16 cores





NVIDIA Kepler GTX680 1,536 cores

| IS —                                          | Sisteffer<br>Global s                                                     | Rasterizer    |                                                                |      |
|-----------------------------------------------|---------------------------------------------------------------------------|---------------|----------------------------------------------------------------|------|
|                                               | GCN - Paint                                                               | GCN Parts     | <b>#U</b>                                                      |      |
|                                               | CON CONTRACTOR                                                            | GCN and April |                                                                |      |
|                                               | Contraction of GCN Statement                                              | GCN and Pairs |                                                                |      |
|                                               | Contraction of Contraction                                                | GCN Street    |                                                                |      |
|                                               | Contract GCN Company                                                      | CON Strait    |                                                                |      |
|                                               | - Crister GCN Company                                                     | CON States    |                                                                |      |
|                                               | Contract GCN Party                                                        | GCN Pair      |                                                                |      |
|                                               | Contra GCN Company                                                        | GCN State     |                                                                |      |
|                                               | GCN Comment                                                               | CON CONTRACT  |                                                                |      |
|                                               | COLOR CON STREET                                                          | GCN 2 April   | - #4                                                           | erts |
|                                               | COM CON STATES                                                            | GCN Depit     |                                                                |      |
|                                               | CON CONTRACTOR                                                            | GCN 2 Part    |                                                                |      |
|                                               | Contraction of CON Statement                                              | CON BOOM      |                                                                |      |
|                                               | Chine CON State                                                           | GCN State     |                                                                |      |
|                                               | CON CONSTRAINT                                                            | GCN 2 April   |                                                                | F    |
|                                               | CON CONTRACTOR                                                            | GCN 2 APR     |                                                                |      |
|                                               | 12.0                                                                      | ache          |                                                                |      |
|                                               |                                                                           | 1             |                                                                |      |
| 4 bit Dual Channel 64<br>Aemory Controller No | bit Dual Channel 64 bit Dual Channel<br>mory Controller Memory Controller |               | al Channel 64 bit Dual Channel<br>Controller Memory Controller |      |

AMD GCN Radeon 7970 2,048 cores

Intel Sandy Bridge E5-2600 8 cores



# **K** Implications

- *Massively* parallel algorithms
  - Billion way parallelism
- Greater reliance on weak scaling?
  - Ruthlessly combat Amdahl's Law with Gustafson's Law

- Islands of performant cache coherence
  - More extreme NUMA



# **K**Conclusions

- Five major hardware trends will affect exascale software design:
  - 1. Changes to memory hierarchies
  - 2. The impact of fault tolerance
  - 3. Focus on energy efficiency
  - 4. Heterogeneity
  - 5. Scale
- Much of this is already predictable, but not all!
- Lots of work left to do!

