Dr Leroy Hood, genomics pioneer
o Medicine will become an information science over the next decade
o Very cross-disciplinary now
o Computation, technology, biology (the driver)
o Need:
o Global analyses (all genes etc)
o Integrate multiscalar data types
o Dynamic analyses
o Hypothesis driven
o Predictive and actionable models (explain emergent behaviour)
o Discovery approaches are key
o Quantification and parallelisation of the generation of biological information
o Big issue: signal to noise challenges
o Disease arises from disease-perturbed networks
o Studying a prion infection disease in mice
o 50 million data sets generated a few years ago from one experiment
o Most of this was noise
o Used a variety of genetically modified mice to narrow this down to 300 relevant genes
o Mapped these 300 genes into four types of biological networks
o Found new genes relevant to the pathogenic networks
o Could then modify these four disease-perturbed networks with drugs to prevent the onset of the full-blown disease
o Blood diagnostics is going to be huge
o Can track the onset and development of disease by the changes in proteins in the blood
o Can use this to:
¤ Detect early
¤ Stratify disease (breast cancer is four or five separate diseases)
¤ Observe disease progression
¤ Follow therapy
o Working with Complete Genomics Inc.
o Family genome sequencing helps create much more accurate sequences (Mendelian principle)
o Making significant use of sequencing microfluidics (nanotech based)
o Just needs a pin-prick of blood sample
o In vitro molecular diagnostics
o Six routine analyses on individual patients within 5-10 years:
o Complete individual genome sequences (ideally at family level)
o Complete individual cell genome sequences – targeting cancer
o Sequencing 1000 transcriptomes simultaneously in one DNA sequencing run from single cancer cells
o 2500 blood organ-specific blood proteins – twice per year, wellness assessment
o Analyze 10000 B and T cells for the functional regions of their immune receptors
o Analyze individual stem cells from each individual
o Predicts that all this can significantly reduce the cost of healthcare
David Randall (Colorado State University)
o Atmosphere, ocean (and sea ice) and land surface (and vegetation)
o Partial differential equations, spherical grids
o Time steps of a few minutes (seconds in Exaflops)
o Need very fast computers
o GCM – Global circulation/climate model
o First 4 models: GFDL, UCLA, NCAR, Livermore
o First global models came from the UK
o Current challenges:
o Global cloud-resolving models (grid resolution <= 1km)
o Simulation of ice-age cycles
o Interactive (land and ocean) biology
o Greater machine parallelism is useful though doesnÕt help speeding up models of a fixed size (e.g. modelling ice ages)
o ÒThe small stuff strongly affects the big stuffÓ
o Dust particle 10-6m, storm 104m in size
o Parameterizations required to account for dust, individual rain drops etc
o Multiscale modelling growing in popularity
o Have scaled to 1km resolution on 82,000 cores on Jaguar (160,000 cores imminently)
o Taking 17 seconds per time step
o Want:
o 1km grid with 5s time step
o 256 layers up to 80km above surface
o 128B total cells
o 32 time stepped variables per grid cell
o 32TB storage for time stepped fields
o 100 TB of storage all together
o Will require 500 TFLOPS per simulated day
o Could run this at 2000 simulated days per day (5 years per day) on a sustained Exaflop machine (3 weeks to simulate a century)
o Could generate 10 PBytes/day of output
o Parameterization wonÕt go away
Green Flash: Exascale Computing for Ultra-High Resolution Climate Modeling
Michael Wehner, LBNL
o Building a model of a global cloud resolving model (!)
o Dynamics, fast physics, slow physics and multi-grid solver
o Want to simulate 1000 times faster than real-time
o 176M vertices in a 12 division grid (1.75km resolution)
o Considering application-specific computer designs for climate modelling
o Climate codes typically run at 5% of peak FLOPS or less (but what about bandwidth)
o Going to make extensive use of auto-tuning of software and hardware
o Also using consumer-electronics components
o Aiming for 100X energy efficiency improvement over mainstream HPC approach
o Reduce power consumption by reducing waste
o E.g. speculative execution, stalls, useless work
o Considering a machine with:
o 21M cores on ~163,000 CPUs, each of 128 cores
o Sustained speed of 12 PFLOPS
o Need 600 MFLOP per processor
o BW 78MB/s per processor
o Estimate the whole system would consume about 5MW
¤ I think they were being very na•ve about this implementation but the core idea is sound