Swiss Deploy Worlds Fastest GPU-Powered Supercomputer

June 19, 2017

By: Michael Feldman

The recent upgrade to the Piz Daint supercomputer at the Swiss National Supercomputing Centre (CSCS), has thrust the machine into limelight here at the ISC High Performance conference on opening day. The Cray XC50 system turned in a Linpack result of 19.6 petaflops, which was good enough to capture the number three spot on the latest TOP500 list, and displace Titan, the former GPU supercomputing champ, running at Oak Ridge National Laboratory.

The Piz Daint upgrade, which was financed by a CHF 40 million investment from ETH Zurich, was part of build-out that began in the fourth quarter of 2016. The principle goal of this effort was to replace the older NVIDIA Tesla K20X graphics chips with the new P100 GPUs. The K20X has a peak double precision performance of 1.3 teraflops, while the P100 (the PCIe version) used for the upgrade tops out at 4.6 petaflops. The CPUs were also swapped out for newer 12-core Intel “Haswell” processors. As a result, CSCS tripled the peak performance of Piz Daint from 7.8 petaflops to 25.3 petaflops.

The upgrade was only partially implemented in November 2016, when the previous TOP500 list was compiled. At that point, CSCS had installed enough of the newer P100 GPUs to elevate the system’s peak performance to nearly 16 petaflops and submit a Linpack result of 9.8 petaflops. They completed the upgrade shortly thereafter, and went into full production by December.

In its current configuration, Piz Daint has 5,320 XC50 nodes, each equipped with two Intel Xeon E5-2690 v3 processors and a single NVIDIA Tesla P100. The system also includes 1,431 GPU-less XC40 nodes, each of which are outfitted with two Intel Xeon E5-2695 v4 (“Broadwell”) processors.  It looks like CSCS did not use this portion of the system for its Linpack run, which would have added another petaflop or so to the results.

Besides being one of the more powerful supercomputers in the world, Piz Daint is also one of greenest – thanks largely to the highly energy-efficient P100 silicon. Drawing just 2.3 MW of power while running Linpack at full tilt, the system logs an energy efficiency ratio of 8.6 gigaflops/watt. That rises to 10.4 gigaflops/watt for a more power-optimized run. Even if it’s not at the tippy top of the latest Green500 list, for a production machine of this size, it’s energy efficiency is exceptional.

The upgrade wasn’t just about performance and efficiency though. Along with the new GPUs and CPUs, CSCS also integrated Cray’s DataWarp “burst buffer” technology into the I/O subsystem. This addition quadrupled the bandwidth to and from storage, which greatly sped up analysis involving small files. On a practical level, that means it’s a lot easier now for users to perform scientific simulations and data analysis on those simulations in parallel, instead of serially.

According to CSCS director Thomas Schulthess, those simulations run the gamut, from materials and life sciences, to physics, geophysics, computational chemistry, climate science and weather prediction. Many of the most popular codes in these domains have been ported to GPUs, and are running on Piz Daint today. “All of our extreme-scale work is actually using accelerators,” Schulthess told TOP500 News.

Unlike most European supercomputing centers, CSCS jumped on the GPU bandwagon rather early.  According to Schulthess, they were developing accelerated codes as far back as 2010, before they even had a really large GPU-powered machine. They invested heavily in application development teams that ported the codes to GPUs, while also ensuring that those the underlying libraries included versions optimized for more conventional multicore processors.

Schulthess says they looked at the Xeon Phi, even the most recent-generation “Knight Landing” processors, but they were sold on the idea of heterogeneity and the efficient performance delivered by GPUs. Some of CSCS’s 1,000 or so users like the hybrid computing model, while others are perfectly content to run their codes on multicore processors. The CPU-GPU combo on Piz Daint gives them that choice.

If CSCS has a parallel in the US, it’s Oak Ridge National Lab. ORNL adopted GPUs in a big way in 2012 with Titan, a Cray XK7 system powered by NVIDIA K20X GPUs. Ever since then, the lab has been the hub of GPU supercomputing for the US Department of Energy.

Ostensibly, Titan, with its 27.1 peak petaflops can still outrun Piz Daint. But since newer GPUs like the P100 get better Linpack yield – not to mention better application performance yield, in general – compared to the older K20X model, Piz Daint was able to usurp Titan’s place on the TOP500 list. Linpack performance for the Oak Ridge machine has held constant at 17.6 petaflops since it was installed five years ago.

With Titan’s demotion to fourth place on the TOP500, this is the first time in over two decades that the US has been shut out of a top-three showing on the list. The last time it happened was in 1996, when three Japanese supercomputers captured the gold, silver, and bronze in the Linpack contest that year.

In any case, Piz Daint’s moment in the sun is likely to be short-lived. With 100 petaflops-and-above systems like Summit (Oak Ridge National Laboratory), Sierra (Lawrence Livermore National Laboratory), and Tianhe-2A (National Super Computer Center of Tianjin) poised for delivery in the latter half of 2017, Piz Daint is apt to slip a few notches in the next list.  Both Summit and Sierra are due to get the latest NVIDIA V100 GPUs plus the IBM Power9 processors, while the Tianhe-2A looks like it will be the recipient of Phytium FT2000/64 ARM processors and Matrix2000 GPDSP accelerators.

Schulthess is unsure if CSCS is going to buy any V100 GPUs. The chip’s performance for double precision floating point operations – the kind most commonly used in HPC simulation codes – is 7.5 teraflops. That’s a decent 50 percent jump compared to the P100, but it may not be enough to warrant another upgrade or new system is such a short timeframe.  Undoubtedly, NVIDIA’s next-generation GPU is already on the drawing board and should be ready to go in a couple of years, or perhaps sooner. And given that powers that be at CSCS seem committed to heterogeneous supercomputing for the foreseeable future, Piz Daint will almost certainly not be the last GPU supercomputer at the Swiss center.