<< 2008-9 >>
Department of
Computer Science

ISC'09 day 2

International SuperComputing 2009 (ISC '09)

ISC'09 is second the largest supercomputing conference on the calendar, and the largest outside the US.

The following notes are pretty much my live transcription as the conference unfolds so please forgive any typos, acronyms not explained etc. I hope you find this useful and/or interesting, please don't hesitate to get in touch if you have any questions!

Conference website.

See also day 1 and day 3 of the conference.

Cloud Computing & HPC - synergy or competition?

A very interesting session to begin the second day of the conference:
  • One of the hottest topics in IT today
  • Cloud computing brings many benefits such as a "pay as you go" business model
  • But HPC is quite different and focuses on performance, also tends to be much more data intensive

System architecture for cloud computing (slides)

Richard Kaufmann, CTO scalable computing, HP

  • Checkout http://trends.google.com
  • The "Cloud" is designed for massive scale and for large failures in the infrastructure
    • E.g. Google, Amazon, Microsoft, Facebook, ...
  • Clouds also designed for co-tenancy
  • HPC nodes tend to be InfiniBand (IB) interconnected
  • HPC also mostly blades these days
    • Rule of thumb: 250W per 2 socket server on a typical workload
  • Cloud nodes tend to be connected by cheapest Ethernet
    • Currently 1 GigE moving to 10 GigE over time
    • Dominated by 1U 2S (socket) servers - moving to aggregated servers
    • Cloud servers tend to be 160W per 2S server
  • HP's designed something very like Google's own internal server
  • Cloud servers can accept wrong answer sometimes, implementing reliability in software
    • E.g. replication rather than RAID etc.
  • Shipping container-based datacentres are becoming mainstream
    • Although apparently they're only shipping about one per month
  • Cloud racks 5-10KW -> 10-14KW/rack in the future
  • HPC racks heading toward 50KW? Water cooled? Probably too high
  • Many-core will continue
  • More efficient power supplies cost more (e.g. extra $200) but save money over 3 years (~$700) so you do come out ahead (~$500)
  • Power supplies are most efficient at around 50% load (this is different from the past)
    • Most other speakers were still talking about power supplies still being most efficient near full load though
  • Harder to find more power efficiencies
    • UPS already 98%
    • Power supply already 94%
    • Fans already 50W headed towards 10W
  • Non-volatile memory (SSDs) will come to help reduce power supply
    • E.g. a standard hard drive is about 10-15W (5400 rpm) while an SSD is more like 3-5W
    • Though checkpoint/restart is least appropriate for flash-like memory (mostly write, little read)
  • Virtualisation isn't necessarily implied by cloud computing - Google doesn't like virtualisation, for example
  • "Private clouds" - using cloud techniques to manage your own resources

System architecture for cloud computing (slides)

Marc Hamilton, VP, Sun

  • Cloud characterised by:
    • Virtualisation?,
    • Multi-tenancy,
    • Real-time user controlled provisioning and
    • Pay per use
  • Still lots of old OS support required - some apps only run on RedHat 3 for example
  • I/O intensive workloads suffer a performance hit in the presence of virtualisation (e.g. IB)
  • SunCloud runs on 10,000 servers in a Sun test bed today
    • Still small compared to Google or Amazon
  • Study in April 2009 IEEE Computer, Ed Tailor, TACC, looked at performance-weighted cost of a server
    • Moore's Law decay in the value of a server
    • Looked at 3 ownership models
    • Showed that for 30,000 cores or above it's cheaper to buy your own system outright
    • But below 300 cores it was more cost effective to go pay per use
    • Was research funded by NSF though! (vested interest in justifying buying large systems)
  • Cloud computing includes:
    • Software as a service (SaaS) - SalesForce
    • Platform as a service (PaaS) - Google app engine
    • Infrastructure as a service (IaaS) - Amazon, Microsoft, Mosso
  • IEEE Computer Dec 2007 Microsoft article said we'd need national level PetaScale computing centre
  • Hybrid clouds?

Clouds for HPC: potential & challenges (slides)

Thomas Lippert, Juelich Supercomputing Centre, Germany

  • See PRACE and Gauss as top level of HPC in Europe with grid at the bottom
  • Leadership HPC systems have a lifetime of 3-5 years, time-scale of know-how of 15-30 years
  • Most industries are around 6 years behind leadership-level HPC
  • "Users need to understand machine as a physicist needs to understand math"
    • E.g. assembler, SSE, MPI, scalability
    • This is tough for scientific users! (More a computer science problem)
  • At Juelich more than 50% of staff work as domain scientists, mathematicians and computer scientists in simulation labs
  • SoftComp is the cloud initiative in Germany
    • 125 nodes
    • Heterogeneous system (all x86 though)
    • IB and GigE
  • Most jobs are serial jobs on SoftComp, parallel jobs always less than 50% and sometimes less than 10%
  • HPC Cloud needs to offer
    • Leading edge performance
    • Security
    • Privacy
    • Long-term storage and curation
    • Uninterrupted service for critical applications
    • High levels of support and associated research

HPC clouds (slides)

Dan Reed, scalable & multicore computing strategist, Microsoft Research

  • HPC has been driven by commodity technology by and large
    • Interconnect still the exception (but watch this space with Gnodal - SMS)
  • "Bulk computing is almost free"
    • But applications and power are not
  • Inexpensive sensors are ubiquitous
  • Moving lots of data is still very hard
  • People are very expensive!
  • Application challenges are increasingly complex
  • Moore's Law tends to favour consumer commodities
  • Economics driving towards many-core and accelerators
    • Also driving software as a service and cloud computing
  • These will drive change in technical computing
  • Multicore -> HPC -> Clouds
  • Successful technologies are invisible - they enable but are unobtrusive
  • Maybe drive science from your iPhone with the compute running on a cloud?
  • Microsoft has a product called "Azure" for services platforms
    • Live, .NET, SQL, SharePoint services on top of compute, storage and interconnect
  • Data is exploding: experiments, simulation, archives, literature
    • Many PetaBytes, doubling every two years (I thought it was faster for data?)
  • Been porting a meta genomics app to Azure with Argonne national lab
    • Used map-reduce style parallel BLAST
  • Microsoft datacentres are 10X bigger than a football pitch
  • O(1,000,000) core per data centre, ~$1B each to build, many companies building several each!
  • Power consumption is a main driver for cloud and HPC datacentres
  • Checkpoint/restart won't scale

The cloud and Google (slides)

Robin Williamson (a Brit!), Engineering Director, Google

  • Mentioned a paper called "Above the clouds: a view from Berkeley", 2009
  • Google aims to organize the world's information and make it universally accessible and useful
    • 100s of TBytes of data just on the web
    • Much of the data is common across many users
    • Bandwidth is slow and expensive in the last mile and improving more slowly than other dimensions
  • Each query has elapsed time 0.25s using >1,000 servers per query, multiple TBytes of RAM, thousands of hard drives! (all applied to just one query for one user)
  • Single threaded performance matters less, many-cores are good
  • Primary concern is cost efficiency
  • Said a core competency is being able to turn off racks and cages as and when possible (power saving)
    • Showed a picture of Rackable systems
  • Have their own Google File System (GFS) - fault tolerant, distributed disk optimised for high-BW read/writes
    • Also MapReduce - programming model and library to simplify large scale computations on large clusters
    • BigTable - a large scale storage system for semi-structured data (database like across thousands of machines)
  • Google apps supports Python or Java and gives a free 1GB of storage and up to 5M page views per month
    • Oriented towards web-based apps, sits of top of http requests, uses BigTable underneath

High Performance cloud computing

Simone Brunozzi, Amazon web services

  • Amazon web services (AWS) not widely used in HPC yet but usage growing fast
  • Aim to be easy to use, flexible, secure, pay per use, no commitment, API and platform neutral
  • AWS using more bandwidth than all the Amazon retail websites combined (wow!)
  • See http://aws.amazon.com/importexport
  • Also http://aws.amazon.com/ElasticMapReduce
  • Can handle physical media being shipped to them such as disk drives, optical media etc
  • EC2 - SAL 99.95%, 10c/h
  • AWS elastic map-reduce uses Hadoop (0.18.3 on Debian)
    • www.datawrangling.com
    • Input copied from Amazon S3
    • Includes NumPy, SciPy, R, BeautifulSoup
  • Partners include Facebook, Microsoft, SalesForce
  • Services tended to get launched in the US first but the gap to launch in Europe is closing (O(1 month))
  • Deepak Singh, Senior bus. dev., deesingh@amazon.com

Data management in the Cloud (slides)

Sanjay Radia, senior architect, Hadoop Project, Yahoo!

  • Hadoop has been 90% developed by Yahoo!
  • Claim projects based on Hadoop can move from research to production in days
    • Focus on the problem not the IT
    • Easy to learn
    • More robust, more global
    • Low cost
    • Don't need to find new hardware to try a new experiment
  • 500M+ unique users per month
  • 100s of PetaBytes of storage
  • 10s of datacentres around the world
  • Hadoop is open source
    • High performance (throughput, latency)
    • Multi data centre
    • + N-way replication
    • + Consistency/availability trade-offs
    • Has it's own distributed file system (HDFS)
  • Uses unreliable storage/servers
    • Use replication to handle this
  • Slightly restricted file access (focus on sequential access, single writers, no file locking features)
  • Aim to move computation close to the data (can be PetaBytes of data...)
  • Simple enough design for a single engineer to be able to understand Hadoop (impressive)
  • Yahoo's search index is built using Hadoop, Yahoo's spam filters too
  • Have it running on 4,000 nodes and 16 PetaBytes of data
  • Some HPC users now: CMU, Berkeley, UIUC, ...
  • See http://hadoop.apache.org/core/ (also on Wiki)
  • sradia@yahoo-inc.com

Panel session on cloud vs. HPC, all previous speakers involved

  • What about data access rights on cloud-based data? Governments could demand access...
  • How do they convince potential users about privacy of their data will be maintained?
  • Need to be able to rent software licenses by the hour along with the CPU

Hot seat sessions: vendor talks

Programming from Multi-core to Multi-server

Ryan Waite, Principal HPC product manager, Microsoft

  • Ryan runs the engineering team developing Microsoft's HPC products (been with MS 17 years)
  • Optimised Argonne National Labs' MPI-CH library and gave the improvements back
  • Want to make parallel programming available to the masses
  • Users still tackling large datasets - PetaByte
  • Data parallelism is a natural programming model
    • Some users are Lisp-like too (e.g. map-reduce)
  • Visual Studio 2010 in beta this month
    • Adding a task parallel library
  • Releasing "Dryad" this July which is MS's answer to Google/Yahoo
  • Said they're going to work more with open source projects in HPC
  • See http://www.microsoft.com/hpc
  • Also http://www.microsoft.com/concurrency

Intelligent performance and more: Intel delivering leadership HPC technology

Richard Dracott (a Brit), GM HPC, Intel

  • Have an extensive suite of software tools & libraries (including compilers, MPI, BLAS/LAPACK etc)
  • Nehalem-EX coming soon:
    • up to 8 cores
    • 24MB of shared, on-chip cache
    • 4 QPI inter-processor links
    • two integrated memory controllers
    • 2.3B transistors (!)
    • In production at the end of 2009
    • Cores can be independently powered up/down to save energy
  • Observed that the Top500 is growing faster than Moore's Law, which he attributed to building larger systems
  • Worried that an ExaFLOP system would need more than 100MW
  • Power per transistor reduced by 6 orders of magnitude in the last 30 years
  • Said solid state disks (SSDs) are an important trend to reduce power consumption and increase density and reliability (and Intel is selling one, so as the Americans say, go figure)
  • Just announced a $7B investment in next generation chip fabs

Heterogeneous computing - coping with the energy wall

Donald Grice, chief engineer on RoadRunner project (current Top500 #1 system), IBM

  • Utility costs currently around $1M per megawatt per year
  • Best anyone does is about 500MFLOPS/watt (2 MW/PetaFLOPS)
  • #1 machine in Top500 grows in performance by 1000X every 10-11 years
  • Focus on frequency and function optimization
    • Pick best frequency for performance and power use trade-off, not just absolute performance
    • Only include functionality you really need for target applications
  • Energy optimised cores put more pressure on the software
    • Programme at a more abstract level
  • Scale-out pressure too - more cores rather than faster cores
  • Heterogeneous computing caused by optimising cores (like GPUs and Cell today)
  • Moving towards OpenCL/OpenMP at the lower levels
  • Described PGAS/CAF/X10/UPC as the high level programming abstractions being targeted by IBM
  • Expects something in between OpenMP/OpenCL

Unified scalable file systems

James Reaney, BlueArc

  • They have built an NSF file server using lots of custom hardware to get better performance
  • Use FPGAs to get performance
  • Can get GBytes/s of performance from fewer disks and it's just NFS with all its enterprise class features
  • Scales really well in both performance and capacity
  • Going to support parallel NFS
  • Could provide a big, fast, easy, green file system

Bull's Petascale approach with standard components

Jean-Marc Denis, HPC operations manager at Bull

  • Want to try and make large-scale HPC as easy as possible
  • Have a new product line called "bullx" based on Intel Xeon and Nvidia Tesla
  • Could reach 1 PetaFLOP with 10,000 blades with 3,900 GPU blades
  • They're introducing a new Shared Memory Processor (SMP) that would group these 10,000 blades into 800 SMP machines - claim this would be much easier to manage (reasonable)
  • Also have a GPU-based blade
  • Have water cooling built into their racks
  • Have some way to manage transient errors - didn't say how this works

Best practices for managing multiple Petabytes of storage in an HPC environment

Nick Harper, VP, Spectra Logic

  • Supplies primarily archive tape drives
  • Design in vaulting and disaster recovery from the start
  • One 10^-17 BER means roughly one undetected error per 10 PetaBytes
  • Vendor numbers tend to flatter by 4-10X in e.g. MTBF figures
  • Look at total cost, not just initial cost: OPEX is roughly twice CAPEX in HPC systems

HPC achievements and impact - 2009 (excellent talk, the slides are very good!)

Thomas Sterling, Louisiana State Universities Centre for Computation and Technology

  • This is the first year after PetaFLOPS (1 A.P.)
  • PetaFLOP machines starting to work on real problems
  • Multi-core now dominating the mainstream
  • Heterogeneous computing powering up - e.g. GPUs
  • Commodity clusters are ubiquitous
  • CPU clock rates are essentially flat in the 2-3GHz range
  • CILK++ is a simple set of C++ extensions for runtime parallelism (MIT?)
    • Also Intel's Thread Building Blocks (TBB)
    • And Microsoft's "Concert" system
  • Sun's "Rock" chip cancelled since their acquisition by Oracle
  • GPUs are a major trend (AMD and Nvidia)
  • OpenCL as a cross-platform programming model is an important step for heterogeneous parallel programming
  • Blades becoming more popular though still expensive so haven't completely displaced "pizza box" 1U servers
    • Some of the most power efficient systems use blades
  • Continuing tension between InfiniBand (IB) and Ethernet
    • 40Gb IB now emerging
  • ORNL achieved >400 TFLOPS sustained on a superconductor application
    • Ran using 150,000 cores
    • Also being used for dark matter astrophysics simulations
  • First PetaFLOP (peak) machine outside US - Jugene in Germany
  • 222 TFLOP machine now in Saudi Arabia - Kaust
    • They have an ambition to reach PetaScale soon and on to ExaFLOP
  • UK has 44 systems in the Top500
    • A lot in weather simulation and climate modelling
  • Earth Simulator 2 now at #22 with 122 TFLOPS (fastest in Japan)
  • 180 TFLOP machine in China (Dawning 5000A) - the fastest in the world running Microsoft Windows
  • ExaScale - the final frontier?
    • 1000X best in world today
    • Likely to need 100s of millions of cores
    • Multi-billion way task concurrency
  • Supercomputing will never achieve ZettaFlops (this was the most contentious claim at the whole conference!)
    • Suggests 32-128 ExaFLOPS will be the limit (wow, he called the top of the performance S curve)
  • DARPA ExaScale studies: technology, software and resiliency
  • Re-emergence of global address space as a programming model (e.g. sgi Ultraviolet)
  • Message-driven computation also re-emerging
  • Dynamic, adaptive resource management
  • Self-aware systems management - power, reliability etc
  • Great talk, would recommend getting hold of any of his talks if you can!
  • Cray has never had the #1 slot in the Top500 (33 lists so far!)

© 2009 University of Bristol  |  Terms and Conditions