ISC'09 day 2
International SuperComputing 2009 (ISC '09)
- ISC'09 is second the largest supercomputing conference on the calendar, and the largest outside the US.
-
The following notes are pretty much my live transcription as the conference unfolds so please forgive any typos, acronyms not explained etc. I hope you find this useful and/or interesting, please don't hesitate to get in touch if you have any questions!
Conference website.
See also day 1 and day 3 of the conference.
Cloud Computing & HPC - synergy or competition?
A very interesting session to begin the second day of the conference:
- One of the hottest topics in IT today
- Cloud computing brings many benefits such as a "pay as you go" business model
- But HPC is quite different and focuses on performance, also tends to be much more data intensive
System architecture for cloud computing (slides)
Richard Kaufmann, CTO scalable computing, HP
- Checkout http://trends.google.com
- The "Cloud" is designed for massive scale and for large failures in the infrastructure
- E.g. Google, Amazon, Microsoft, Facebook, ...
- Clouds also designed for co-tenancy
- HPC nodes tend to be InfiniBand (IB) interconnected
- HPC also mostly blades these days
- Rule of thumb: 250W per 2 socket server on a typical workload
- Cloud nodes tend to be connected by cheapest Ethernet
- Currently 1 GigE moving to 10 GigE over time
- Dominated by 1U 2S (socket) servers - moving to aggregated servers
- Cloud servers tend to be 160W per 2S server
- HP's designed something very like Google's own internal server
- Cloud servers can accept wrong answer sometimes, implementing reliability in software
- E.g. replication rather than RAID etc.
- Shipping container-based datacentres are becoming mainstream
- Although apparently they're only shipping about one per month
- Cloud racks 5-10KW -> 10-14KW/rack in the future
- HPC racks heading toward 50KW? Water cooled? Probably too high
- Many-core will continue
- More efficient power supplies cost more (e.g. extra $200) but save money over 3 years (~$700) so you do come out ahead (~$500)
- Power supplies are most efficient at around 50% load (this is different from the past)
- Most other speakers were still talking about power supplies still being most efficient near full load though
- Harder to find more power efficiencies
- UPS already 98%
- Power supply already 94%
- Fans already 50W headed towards 10W
- Non-volatile memory (SSDs) will come to help reduce power supply
- E.g. a standard hard drive is about 10-15W (5400 rpm) while an SSD is more like 3-5W
- Though checkpoint/restart is least appropriate for flash-like memory (mostly write, little read)
- Virtualisation isn't necessarily implied by cloud computing - Google doesn't like virtualisation, for example
- "Private clouds" - using cloud techniques to manage your own resources
System architecture for cloud computing (slides)
Marc Hamilton, VP, Sun
- Cloud characterised by:
- Virtualisation?,
- Multi-tenancy,
- Real-time user controlled provisioning and
- Pay per use
- Still lots of old OS support required - some apps only run on RedHat 3 for example
- I/O intensive workloads suffer a performance hit in the presence of virtualisation (e.g. IB)
- SunCloud runs on 10,000 servers in a Sun test bed today
- Still small compared to Google or Amazon
- Study in April 2009 IEEE Computer, Ed Tailor, TACC, looked at performance-weighted cost of a server
- Moore's Law decay in the value of a server
- Looked at 3 ownership models
- Showed that for 30,000 cores or above it's cheaper to buy your own system outright
- But below 300 cores it was more cost effective to go pay per use
- Was research funded by NSF though! (vested interest in justifying buying large systems)
- Cloud computing includes:
- Software as a service (SaaS) - SalesForce
- Platform as a service (PaaS) - Google app engine
- Infrastructure as a service (IaaS) - Amazon, Microsoft, Mosso
- IEEE Computer Dec 2007 Microsoft article said we'd need national level PetaScale computing centre
- Hybrid clouds?
Clouds for HPC: potential & challenges (slides)
Thomas Lippert, Juelich Supercomputing Centre, Germany
- See PRACE and Gauss as top level of HPC in Europe with grid at the bottom
- Leadership HPC systems have a lifetime of 3-5 years, time-scale of know-how of 15-30 years
- Most industries are around 6 years behind leadership-level HPC
- "Users need to understand machine as a physicist needs to understand math"
- E.g. assembler, SSE, MPI, scalability
- This is tough for scientific users! (More a computer science problem)
- At Juelich more than 50% of staff work as domain scientists, mathematicians and computer scientists in simulation labs
- SoftComp is the cloud initiative in Germany
- 125 nodes
- Heterogeneous system (all x86 though)
- IB and GigE
- Most jobs are serial jobs on SoftComp, parallel jobs always less than 50% and sometimes less than 10%
- HPC Cloud needs to offer
- Leading edge performance
- Security
- Privacy
- Long-term storage and curation
- Uninterrupted service for critical applications
- High levels of support and associated research
HPC clouds (slides)
Dan Reed, scalable & multicore computing strategist, Microsoft Research
- HPC has been driven by commodity technology by and large
- Interconnect still the exception (but watch this space with Gnodal - SMS)
- "Bulk computing is almost free"
- But applications and power are not
- Inexpensive sensors are ubiquitous
- Moving lots of data is still very hard
- People are very expensive!
- Application challenges are increasingly complex
- Moore's Law tends to favour consumer commodities
- Economics driving towards many-core and accelerators
- Also driving software as a service and cloud computing
- These will drive change in technical computing
- Multicore -> HPC -> Clouds
- Successful technologies are invisible - they enable but are unobtrusive
- Maybe drive science from your iPhone with the compute running on a cloud?
- Microsoft has a product called "Azure" for services platforms
- Live, .NET, SQL, SharePoint services on top of compute, storage and interconnect
- Data is exploding: experiments, simulation, archives, literature
- Many PetaBytes, doubling every two years (I thought it was faster for data?)
- Been porting a meta genomics app to Azure with Argonne national lab
- Used map-reduce style parallel BLAST
- Microsoft datacentres are 10X bigger than a football pitch
- O(1,000,000) core per data centre, ~$1B each to build, many companies building several each!
- Power consumption is a main driver for cloud and HPC datacentres
- Checkpoint/restart won't scale
The cloud and Google (slides)
Robin Williamson (a Brit!), Engineering Director, Google
- Mentioned a paper called "Above the clouds: a view from Berkeley", 2009
- Google aims to organize the world's information and make it universally accessible and useful
- 100s of TBytes of data just on the web
- Much of the data is common across many users
- Bandwidth is slow and expensive in the last mile and improving more slowly than other dimensions
- Each query has elapsed time 0.25s using >1,000 servers per query, multiple TBytes of RAM, thousands of hard drives! (all applied to just one query for one user)
- Single threaded performance matters less, many-cores are good
- Primary concern is cost efficiency
- Said a core competency is being able to turn off racks and cages as and when possible (power saving)
- Showed a picture of Rackable systems
- Have their own Google File System (GFS) - fault tolerant, distributed disk optimised for high-BW read/writes
- Also MapReduce - programming model and library to simplify large scale computations on large clusters
- BigTable - a large scale storage system for semi-structured data (database like across thousands of machines)
- Google apps supports Python or Java and gives a free 1GB of storage and up to 5M page views per month
- Oriented towards web-based apps, sits of top of http requests, uses BigTable underneath
High Performance cloud computing
Simone Brunozzi, Amazon web services
- Amazon web services (AWS) not widely used in HPC yet but usage growing fast
- Aim to be easy to use, flexible, secure, pay per use, no commitment, API and platform neutral
- AWS using more bandwidth than all the Amazon retail websites combined (wow!)
- See http://aws.amazon.com/importexport
- Also http://aws.amazon.com/ElasticMapReduce
- Can handle physical media being shipped to them such as disk drives, optical media etc
- EC2 - SAL 99.95%, 10c/h
- AWS elastic map-reduce uses Hadoop (0.18.3 on Debian)
- www.datawrangling.com
- Input copied from Amazon S3
- Includes NumPy, SciPy, R, BeautifulSoup
- Partners include Facebook, Microsoft, SalesForce
- Services tended to get launched in the US first but the gap to launch in Europe is closing (O(1 month))
- Deepak Singh, Senior bus. dev., deesingh@amazon.com
Data management in the Cloud (slides)
Sanjay Radia, senior architect, Hadoop Project, Yahoo!
- Hadoop has been 90% developed by Yahoo!
- Claim projects based on Hadoop can move from research to production in days
- Focus on the problem not the IT
- Easy to learn
- More robust, more global
- Low cost
- Don't need to find new hardware to try a new experiment
- 500M+ unique users per month
- 100s of PetaBytes of storage
- 10s of datacentres around the world
- Hadoop is open source
- High performance (throughput, latency)
- Multi data centre
- + N-way replication
- + Consistency/availability trade-offs
- Has it's own distributed file system (HDFS)
- Uses unreliable storage/servers
- Use replication to handle this
- Slightly restricted file access (focus on sequential access, single writers, no file locking features)
- Aim to move computation close to the data (can be PetaBytes of data...)
- Simple enough design for a single engineer to be able to understand Hadoop (impressive)
- Yahoo's search index is built using Hadoop, Yahoo's spam filters too
- Have it running on 4,000 nodes and 16 PetaBytes of data
- Some HPC users now: CMU, Berkeley, UIUC, ...
- See http://hadoop.apache.org/core/ (also on Wiki)
- sradia@yahoo-inc.com
Panel session on cloud vs. HPC, all previous speakers involved
- What about data access rights on cloud-based data? Governments could demand access...
- How do they convince potential users about privacy of their data will be maintained?
- Need to be able to rent software licenses by the hour along with the CPU
Hot seat sessions: vendor talks
Programming from Multi-core to Multi-server
Ryan Waite, Principal HPC product manager, Microsoft
- Ryan runs the engineering team developing Microsoft's HPC products (been with MS 17 years)
- Optimised Argonne National Labs' MPI-CH library and gave the improvements back
- Want to make parallel programming available to the masses
- Users still tackling large datasets - PetaByte
- Data parallelism is a natural programming model
- Some users are Lisp-like too (e.g. map-reduce)
- Visual Studio 2010 in beta this month
- Adding a task parallel library
- Releasing "Dryad" this July which is MS's answer to Google/Yahoo
- Said they're going to work more with open source projects in HPC
- See http://www.microsoft.com/hpc
- Also http://www.microsoft.com/concurrency
Intelligent performance and more: Intel delivering leadership HPC technology
Richard Dracott (a Brit), GM HPC, Intel
- Have an extensive suite of software tools & libraries (including compilers, MPI, BLAS/LAPACK etc)
- Nehalem-EX coming soon:
- up to 8 cores
- 24MB of shared, on-chip cache
- 4 QPI inter-processor links
- two integrated memory controllers
- 2.3B transistors (!)
- In production at the end of 2009
- Cores can be independently powered up/down to save energy
- Observed that the Top500 is growing faster than Moore's Law, which he attributed to building larger systems
- Worried that an ExaFLOP system would need more than 100MW
- Power per transistor reduced by 6 orders of magnitude in the last 30 years
- Said solid state disks (SSDs) are an important trend to reduce power consumption and increase density and reliability (and Intel is selling one, so as the Americans say, go figure)
- Just announced a $7B investment in next generation chip fabs
Heterogeneous computing - coping with the energy wall
Donald Grice, chief engineer on RoadRunner project (current Top500 #1 system), IBM
- Utility costs currently around $1M per megawatt per year
- Best anyone does is about 500MFLOPS/watt (2 MW/PetaFLOPS)
- #1 machine in Top500 grows in performance by 1000X every 10-11 years
- Focus on frequency and function optimization
- Pick best frequency for performance and power use trade-off, not just absolute performance
- Only include functionality you really need for target applications
- Energy optimised cores put more pressure on the software
- Programme at a more abstract level
- Scale-out pressure too - more cores rather than faster cores
- Heterogeneous computing caused by optimising cores (like GPUs and Cell today)
- Moving towards OpenCL/OpenMP at the lower levels
- Described PGAS/CAF/X10/UPC as the high level programming abstractions being targeted by IBM
- Expects something in between OpenMP/OpenCL
Unified scalable file systems
James Reaney, BlueArc
- They have built an NSF file server using lots of custom hardware to get better performance
- Use FPGAs to get performance
- Can get GBytes/s of performance from fewer disks and it's just NFS with all its enterprise class features
- Scales really well in both performance and capacity
- Going to support parallel NFS
- Could provide a big, fast, easy, green file system
Bull's Petascale approach with standard components
Jean-Marc Denis, HPC operations manager at Bull
- Want to try and make large-scale HPC as easy as possible
- Have a new product line called "bullx" based on Intel Xeon and Nvidia Tesla
- Could reach 1 PetaFLOP with 10,000 blades with 3,900 GPU blades
- They're introducing a new Shared Memory Processor (SMP) that would group these 10,000 blades into 800 SMP machines - claim this would be much easier to manage (reasonable)
- Also have a GPU-based blade
- Have water cooling built into their racks
- Have some way to manage transient errors - didn't say how this works
Best practices for managing multiple Petabytes of storage in an HPC environment
Nick Harper, VP, Spectra Logic
- Supplies primarily archive tape drives
- Design in vaulting and disaster recovery from the start
- One 10^-17 BER means roughly one undetected error per 10 PetaBytes
- Vendor numbers tend to flatter by 4-10X in e.g. MTBF figures
- Look at total cost, not just initial cost: OPEX is roughly twice CAPEX in HPC systems
HPC achievements and impact - 2009 (excellent talk, the slides are very good!)
Thomas Sterling, Louisiana State Universities Centre for Computation and Technology
- This is the first year after PetaFLOPS (1 A.P.)
- PetaFLOP machines starting to work on real problems
- Multi-core now dominating the mainstream
- Heterogeneous computing powering up - e.g. GPUs
- Commodity clusters are ubiquitous
- CPU clock rates are essentially flat in the 2-3GHz range
- CILK++ is a simple set of C++ extensions for runtime parallelism (MIT?)
- Also Intel's Thread Building Blocks (TBB)
- And Microsoft's "Concert" system
- Sun's "Rock" chip cancelled since their acquisition by Oracle
- GPUs are a major trend (AMD and Nvidia)
- OpenCL as a cross-platform programming model is an important step for heterogeneous parallel programming
- Blades becoming more popular though still expensive so haven't completely displaced "pizza box" 1U servers
- Some of the most power efficient systems use blades
- Continuing tension between InfiniBand (IB) and Ethernet
- ORNL achieved >400 TFLOPS sustained on a superconductor application
- Ran using 150,000 cores
- Also being used for dark matter astrophysics simulations
- First PetaFLOP (peak) machine outside US - Jugene in Germany
- 222 TFLOP machine now in Saudi Arabia - Kaust
- They have an ambition to reach PetaScale soon and on to ExaFLOP
- UK has 44 systems in the Top500
- A lot in weather simulation and climate modelling
- Earth Simulator 2 now at #22 with 122 TFLOPS (fastest in Japan)
- 180 TFLOP machine in China (Dawning 5000A) - the fastest in the world running Microsoft Windows
- ExaScale - the final frontier?
- 1000X best in world today
- Likely to need 100s of millions of cores
- Multi-billion way task concurrency
- Supercomputing will never achieve ZettaFlops (this was the most contentious claim at the whole conference!)
- Suggests 32-128 ExaFLOPS will be the limit (wow, he called the top of the performance S curve)
- DARPA ExaScale studies: technology, software and resiliency
- Re-emergence of global address space as a programming model (e.g. sgi Ultraviolet)
- Message-driven computation also re-emerging
- Dynamic, adaptive resource management
- Self-aware systems management - power, reliability etc
- Great talk, would recommend getting hold of any of his talks if you can!
- Cray has never had the #1 slot in the Top500 (33 lists so far!)
|