ISC'09 day 2

International SuperComputing 2009 (ISC '09)

ISC'09 is second the largest supercomputing conference on the calendar, and the largest outside the US.

The following notes are pretty much my live transcription as the conference unfolds so please forgive any typos, acronyms not explained etc. I hope you find this useful and/or interesting, please don't hesitate to get in touch if you have any questions!

Conference website.

See also day 1 and day 3 of the conference.

Cloud Computing & HPC - synergy or competition?

A very interesting session to begin the second day of the conference:

One of the hottest topics in IT today
Cloud computing brings many benefits such as a "pay as you go" business model
But HPC is quite different and focuses on performance, also tends to be much more data intensive

System architecture for cloud computing (slides)

Richard Kaufmann, CTO scalable computing, HP

Checkout http://trends.google.com
The "Cloud" is designed for massive scale and for large failures in the infrastructure
- E.g. Google, Amazon, Microsoft, Facebook, ...
Clouds also designed for co-tenancy
HPC nodes tend to be InfiniBand (IB) interconnected
HPC also mostly blades these days
- Rule of thumb: 250W per 2 socket server on a typical workload
Cloud nodes tend to be connected by cheapest Ethernet
- Currently 1 GigE moving to 10 GigE over time
- Dominated by 1U 2S (socket) servers - moving to aggregated servers
- Cloud servers tend to be 160W per 2S server
HP's designed something very like Google's own internal server
Cloud servers can accept wrong answer sometimes, implementing reliability in software
- E.g. replication rather than RAID etc.
Shipping container-based datacentres are becoming mainstream
- Although apparently they're only shipping about one per month
Cloud racks 5-10KW -> 10-14KW/rack in the future
HPC racks heading toward 50KW? Water cooled? Probably too high
Many-core will continue
More efficient power supplies cost more (e.g. extra $200) but save money over 3 years (~$700) so you do come out ahead (~$500)
Power supplies are most efficient at around 50% load (this is different from the past)
- Most other speakers were still talking about power supplies still being most efficient near full load though
Harder to find more power efficiencies
- UPS already 98%
- Power supply already 94%
- Fans already 50W headed towards 10W
Non-volatile memory (SSDs) will come to help reduce power supply
- E.g. a standard hard drive is about 10-15W (5400 rpm) while an SSD is more like 3-5W
- Though checkpoint/restart is least appropriate for flash-like memory (mostly write, little read)
Virtualisation isn't necessarily implied by cloud computing - Google doesn't like virtualisation, for example
"Private clouds" - using cloud techniques to manage your own resources

System architecture for cloud computing (slides)

Marc Hamilton, VP, Sun

Cloud characterised by:
- Virtualisation?,
- Multi-tenancy,
- Real-time user controlled provisioning and
- Pay per use
Still lots of old OS support required - some apps only run on RedHat 3 for example
I/O intensive workloads suffer a performance hit in the presence of virtualisation (e.g. IB)
SunCloud runs on 10,000 servers in a Sun test bed today
- Still small compared to Google or Amazon
Study in April 2009 IEEE Computer, Ed Tailor, TACC, looked at performance-weighted cost of a server
- Moore's Law decay in the value of a server
- Looked at 3 ownership models
- Showed that for 30,000 cores or above it's cheaper to buy your own system outright
- But below 300 cores it was more cost effective to go pay per use
- Was research funded by NSF though! (vested interest in justifying buying large systems)
Cloud computing includes:
- Software as a service (SaaS) - SalesForce
- Platform as a service (PaaS) - Google app engine
- Infrastructure as a service (IaaS) - Amazon, Microsoft, Mosso
IEEE Computer Dec 2007 Microsoft article said we'd need national level PetaScale computing centre
Hybrid clouds?

Clouds for HPC: potential & challenges (slides)

Thomas Lippert, Juelich Supercomputing Centre, Germany

See PRACE and Gauss as top level of HPC in Europe with grid at the bottom
Leadership HPC systems have a lifetime of 3-5 years, time-scale of know-how of 15-30 years
Most industries are around 6 years behind leadership-level HPC
"Users need to understand machine as a physicist needs to understand math"
- E.g. assembler, SSE, MPI, scalability
- This is tough for scientific users! (More a computer science problem)
At Juelich more than 50% of staff work as domain scientists, mathematicians and computer scientists in simulation labs
SoftComp is the cloud initiative in Germany
- 125 nodes
- Heterogeneous system (all x86 though)
- IB and GigE
Most jobs are serial jobs on SoftComp, parallel jobs always less than 50% and sometimes less than 10%
HPC Cloud needs to offer
- Leading edge performance
- Security
- Privacy
- Long-term storage and curation
- Uninterrupted service for critical applications
- High levels of support and associated research

HPC clouds (slides)

Dan Reed, scalable & multicore computing strategist, Microsoft Research

HPC has been driven by commodity technology by and large
- Interconnect still the exception (but watch this space with Gnodal - SMS)
"Bulk computing is almost free"
- But applications and power are not
Inexpensive sensors are ubiquitous
Moving lots of data is still very hard
People are very expensive!
Application challenges are increasingly complex
Moore's Law tends to favour consumer commodities
Economics driving towards many-core and accelerators
- Also driving software as a service and cloud computing
These will drive change in technical computing
Multicore -> HPC -> Clouds
Successful technologies are invisible - they enable but are unobtrusive
Maybe drive science from your iPhone with the compute running on a cloud?
Microsoft has a product called "Azure" for services platforms
- Live, .NET, SQL, SharePoint services on top of compute, storage and interconnect
Data is exploding: experiments, simulation, archives, literature
- Many PetaBytes, doubling every two years (I thought it was faster for data?)
Been porting a meta genomics app to Azure with Argonne national lab
- Used map-reduce style parallel BLAST
Microsoft datacentres are 10X bigger than a football pitch
O(1,000,000) core per data centre, ~$1B each to build, many companies building several each!
Power consumption is a main driver for cloud and HPC datacentres
Checkpoint/restart won't scale

The cloud and Google (slides)

Robin Williamson (a Brit!), Engineering Director, Google

Mentioned a paper called "Above the clouds: a view from Berkeley", 2009
Google aims to organize the world's information and make it universally accessible and useful
- 100s of TBytes of data just on the web
- Much of the data is common across many users
- Bandwidth is slow and expensive in the last mile and improving more slowly than other dimensions
Each query has elapsed time 0.25s using >1,000 servers per query, multiple TBytes of RAM, thousands of hard drives! (all applied to just one query for one user)
Single threaded performance matters less, many-cores are good
Primary concern is cost efficiency
Said a core competency is being able to turn off racks and cages as and when possible (power saving)
- Showed a picture of Rackable systems
Have their own Google File System (GFS) - fault tolerant, distributed disk optimised for high-BW read/writes
- Also MapReduce - programming model and library to simplify large scale computations on large clusters
- BigTable - a large scale storage system for semi-structured data (database like across thousands of machines)
Google apps supports Python or Java and gives a free 1GB of storage and up to 5M page views per month
- Oriented towards web-based apps, sits of top of http requests, uses BigTable underneath

High Performance cloud computing

Simone Brunozzi, Amazon web services

Amazon web services (AWS) not widely used in HPC yet but usage growing fast
Aim to be easy to use, flexible, secure, pay per use, no commitment, API and platform neutral
AWS using more bandwidth than all the Amazon retail websites combined (wow!)
See http://aws.amazon.com/importexport
Also http://aws.amazon.com/ElasticMapReduce
Can handle physical media being shipped to them such as disk drives, optical media etc
EC2 - SAL 99.95%, 10c/h
AWS elastic map-reduce uses Hadoop (0.18.3 on Debian)
- www.datawrangling.com
- Input copied from Amazon S3
- Includes NumPy, SciPy, R, BeautifulSoup
Partners include Facebook, Microsoft, SalesForce
Services tended to get launched in the US first but the gap to launch in Europe is closing (O(1 month))
Deepak Singh, Senior bus. dev., deesingh@amazon.com

Data management in the Cloud (slides)

Sanjay Radia, senior architect, Hadoop Project, Yahoo!

Hadoop has been 90% developed by Yahoo!
Claim projects based on Hadoop can move from research to production in days
- Focus on the problem not the IT
- Easy to learn
- More robust, more global
- Low cost
- Don't need to find new hardware to try a new experiment
500M+ unique users per month
100s of PetaBytes of storage
10s of datacentres around the world
Hadoop is open source
- High performance (throughput, latency)
- Multi data centre
- + N-way replication
- + Consistency/availability trade-offs
- Has it's own distributed file system (HDFS)
Uses unreliable storage/servers
- Use replication to handle this
Slightly restricted file access (focus on sequential access, single writers, no file locking features)
Aim to move computation close to the data (can be PetaBytes of data...)
Simple enough design for a single engineer to be able to understand Hadoop (impressive)
Yahoo's search index is built using Hadoop, Yahoo's spam filters too
Have it running on 4,000 nodes and 16 PetaBytes of data
Some HPC users now: CMU, Berkeley, UIUC, ...
See http://hadoop.apache.org/core/ (also on Wiki)
sradia@yahoo-inc.com

Panel session on cloud vs. HPC, all previous speakers involved

What about data access rights on cloud-based data? Governments could demand access...
How do they convince potential users about privacy of their data will be maintained?
Need to be able to rent software licenses by the hour along with the CPU

Hot seat sessions: vendor talks

Programming from Multi-core to Multi-server

Ryan Waite, Principal HPC product manager, Microsoft

Ryan runs the engineering team developing Microsoft's HPC products (been with MS 17 years)
Optimised Argonne National Labs' MPI-CH library and gave the improvements back
Want to make parallel programming available to the masses
Users still tackling large datasets - PetaByte
Data parallelism is a natural programming model
- Some users are Lisp-like too (e.g. map-reduce)
Visual Studio 2010 in beta this month
- Adding a task parallel library
Releasing "Dryad" this July which is MS's answer to Google/Yahoo
Said they're going to work more with open source projects in HPC
See http://www.microsoft.com/hpc
Also http://www.microsoft.com/concurrency

Intelligent performance and more: Intel delivering leadership HPC technology

Richard Dracott (a Brit), GM HPC, Intel

Have an extensive suite of software tools & libraries (including compilers, MPI, BLAS/LAPACK etc)
Nehalem-EX coming soon:
- up to 8 cores
- 24MB of shared, on-chip cache
- 4 QPI inter-processor links
- two integrated memory controllers
- 2.3B transistors (!)
- In production at the end of 2009
- Cores can be independently powered up/down to save energy
Observed that the Top500 is growing faster than Moore's Law, which he attributed to building larger systems
Worried that an ExaFLOP system would need more than 100MW
Power per transistor reduced by 6 orders of magnitude in the last 30 years
Said solid state disks (SSDs) are an important trend to reduce power consumption and increase density and reliability (and Intel is selling one, so as the Americans say, go figure)
Just announced a $7B investment in next generation chip fabs

Heterogeneous computing - coping with the energy wall

Donald Grice, chief engineer on RoadRunner project (current Top500 #1 system), IBM

Utility costs currently around $1M per megawatt per year
Best anyone does is about 500MFLOPS/watt (2 MW/PetaFLOPS)
#1 machine in Top500 grows in performance by 1000X every 10-11 years
Focus on frequency and function optimization
- Pick best frequency for performance and power use trade-off, not just absolute performance
- Only include functionality you really need for target applications
Energy optimised cores put more pressure on the software
- Programme at a more abstract level
Scale-out pressure too - more cores rather than faster cores
Heterogeneous computing caused by optimising cores (like GPUs and Cell today)
Moving towards OpenCL/OpenMP at the lower levels
Described PGAS/CAF/X10/UPC as the high level programming abstractions being targeted by IBM
Expects something in between OpenMP/OpenCL

Unified scalable file systems

James Reaney, BlueArc

They have built an NSF file server using lots of custom hardware to get better performance
Use FPGAs to get performance
Can get GBytes/s of performance from fewer disks and it's just NFS with all its enterprise class features
Scales really well in both performance and capacity
Going to support parallel NFS
Could provide a big, fast, easy, green file system

Bull's Petascale approach with standard components

Jean-Marc Denis, HPC operations manager at Bull

Want to try and make large-scale HPC as easy as possible
Have a new product line called "bullx" based on Intel Xeon and Nvidia Tesla
Could reach 1 PetaFLOP with 10,000 blades with 3,900 GPU blades
They're introducing a new Shared Memory Processor (SMP) that would group these 10,000 blades into 800 SMP machines - claim this would be much easier to manage (reasonable)
Also have a GPU-based blade
Have water cooling built into their racks
Have some way to manage transient errors - didn't say how this works

Best practices for managing multiple Petabytes of storage in an HPC environment

Nick Harper, VP, Spectra Logic

Supplies primarily archive tape drives
Design in vaulting and disaster recovery from the start
One 10^-17 BER means roughly one undetected error per 10 PetaBytes
Vendor numbers tend to flatter by 4-10X in e.g. MTBF figures
Look at total cost, not just initial cost: OPEX is roughly twice CAPEX in HPC systems

HPC achievements and impact - 2009 (excellent talk, the slides are very good!)

Thomas Sterling, Louisiana State Universities Centre for Computation and Technology

This is the first year after PetaFLOPS (1 A.P.)
PetaFLOP machines starting to work on real problems
Multi-core now dominating the mainstream
Heterogeneous computing powering up - e.g. GPUs
Commodity clusters are ubiquitous
CPU clock rates are essentially flat in the 2-3GHz range
CILK++ is a simple set of C++ extensions for runtime parallelism (MIT?)
- Also Intel's Thread Building Blocks (TBB)
- And Microsoft's "Concert" system
Sun's "Rock" chip cancelled since their acquisition by Oracle
GPUs are a major trend (AMD and Nvidia)
OpenCL as a cross-platform programming model is an important step for heterogeneous parallel programming
Blades becoming more popular though still expensive so haven't completely displaced "pizza box" 1U servers
- Some of the most power efficient systems use blades
Continuing tension between InfiniBand (IB) and Ethernet
- 40Gb IB now emerging
ORNL achieved >400 TFLOPS sustained on a superconductor application
- Ran using 150,000 cores
- Also being used for dark matter astrophysics simulations
First PetaFLOP (peak) machine outside US - Jugene in Germany
222 TFLOP machine now in Saudi Arabia - Kaust
- They have an ambition to reach PetaScale soon and on to ExaFLOP
UK has 44 systems in the Top500
- A lot in weather simulation and climate modelling
Earth Simulator 2 now at #22 with 122 TFLOPS (fastest in Japan)
180 TFLOP machine in China (Dawning 5000A) - the fastest in the world running Microsoft Windows
ExaScale - the final frontier?
- 1000X best in world today
- Likely to need 100s of millions of cores
- Multi-billion way task concurrency
Supercomputing will never achieve ZettaFlops (this was the most contentious claim at the whole conference!)
- Suggests 32-128 ExaFLOPS will be the limit (wow, he called the top of the performance S curve)
DARPA ExaScale studies: technology, software and resiliency
Re-emergence of global address space as a programming model (e.g. sgi Ultraviolet)
Message-driven computation also re-emerging
Dynamic, adaptive resource management
Self-aware systems management - power, reliability etc
Great talk, would recommend getting hold of any of his talks if you can!
Cray has never had the #1 slot in the Top500 (33 lists so far!)

See also day 1 and day 3 of the conference
Back to conferences
Back to Simon's home page