High Performance Computing in 2017: Hits and Misses

Dec. 19, 2017

By: Michael Feldman

The past 12 months encompassed a number of new developments in HPC, as well as an intensification of existing trends. TOP500 News takes a look at the top eight hits and misses of 2017.

Hit: Machine learning,  the killer app for HPC

Machine learning, and the broader category of AI, continued to spread its influence across the HPC landscape in 2017. Web-based applications in search, ad-serving, language translation and image recognition continued to get smarter this year, as more sophisticated neural network models were developed. What’s new this year is the beginning of trend that inserts this technology into a broad range of traditional HPC workflows.

In applications as distinct as weather modeling, financial risk analysis, astrophysics simulations, and diagnostic medicine, developers used machine learning software to improve accuracy of their models and speed time-to-result. At the same time, conventional supercomputing platforms are also being used for machine learning R&D. In one of the most impressive computing demonstrations of the year, a poker-playing AI known as Libratus trained itself on the Bridges supercomputer at the Pittsburgh Supercomputing Center, and went on to crush four of the best professional players in the world. As more powerful GPUs make thier way into supercomputers (see below), we should see a lot more cutting-edge machine learning research being performed on these machines.

Hit: NVIDIA makes Volta GPU a deep learning monster

NVIDIA intensified its dominance in the AI space, with the launch of its Volta V100 GPU in May. With special circuitry for tensor processing, the V100 put unprecedented amounts of deep learning processing power – 120 teraflops per chip – into the hands of anyone with a spare PCIe port. Amazon and Microsoft will be the earliest adopters of the technology, followed soon thereafter by Baidu.

In addition to its deep learning prowess, the V100 GPU also deliver 7 double precision teraflops, making it eminently suitable for conventional HPC setups. The devices are already being deployed in the Department of Energy’s two most powerful supercomputers, Summit and Sierra, both of which are expected to come online in the first half of 2018. Those systems promise to be in high demand for both traditional HPC simulations and machine learning applications.

Miss: Intel fumbles pre-exascale deployment, drops Knights Hill

In October, the Department of Energy reported that its 180-petaflop Aurora supercomputer, which was slated to be installed at Argonne National Lab next year, was canceled. The system was to be powered by Knights Hill, Intel’s next-generation Xeon Phi processors. Instead, Aurora will be remade into a one exaflop system to be deployed in the 2020-2021 timeframe.

The rationale for the change in plans was not made clear, and as we wrote at the time, “something apparently went wrong with the Aurora work, and the Knights Hill chip looks like the prime suspect.” In November, Intel revealed it had dumped the Knights Hill product, without specifying any alternate roadmap for the Xeon Phi line.

Hit and Miss: AMD offers alternatives to Intel and NVIDIA silicon

In June, AMD launched EPYC, the chipmaker’s first credible alternative to Intel’s Xeon product line since the original Opteron processors. The EPYC 7000 series processors has more cores, more I/O connectivity, and better memory bandwidth than Intel’s “Skylake” Xeon CPUs, which were launched in July. Although AMD initially missed the opportunity to talk about the EPYC processors during ISC 2017, subsequent third-party testing and a more concerted effort by AMD at SC17 revealed that the EPYC processors had some advantages for HPC workloads, at least for some of them. Nonetheless, Intel will prove difficult dislodge from its position atop the datacenter food chain.

At SC17, AMD also talked up their Radeon Instinct GPUs (initially announced in December 2016), the chipmaker’s first serious foray into the machine learning datacenter space. These processors have plenty of flops to offer, but nothing approaching the performance of V100 for deep learning, since the Radeon hardware lacks the specialized arithmetic units that NVIDIA added for neural net acceleration. AMD is counting on its more open approach to GPU software to lure CUDA customers away from NVIDIA’s clutches.

Hit: Cavium becomes the center of gravity for ARM-powered HPC

Cavium’s second-generation ThunderX2 ARM server SoC was soft-launched way back in May 2016, but it wasn’t until this year that the chip got some attention from users and vendors. The processor offers decent performance, superior memory bandwidth, and an abundance of external connectivity to distinguish it from other ARM chip vendors taking aim at the datacenter.

In January, the EU’s Mont-Blanc project selected the ThunderX2 for its phase three pre-exascale prototype, which will be constructed by Atos/Bull. The French computer-maker intends to productize the ARM-based Mont-Blanc design as an option on its Sequana supercomputer line. In November, Cray followed suit with a ThunderX2-powered XC50 blade, which will become the basis of the Isambard supercomputer in the UK. HPE, Gigabyte Technology, and Ingrasys also came up with their own versions of ThunderX2-based servers. With the ARM software ecosystem for the datacenter also starting to fill out, 2018 could be a breakout year for the architecture in high performance computing and elsewhere.

Hit: Microsoft inches its way back into HPC

Between Microsoft’s acquisition of Cycle Computing and the next upgrade of its FPGA-accelerated Azure cloud, Microsoft looks like it’s becoming a bigger HPC player, at least in terms of technology prowess. Although the company still offers plenty of NVIDIA GPUs in Azure for cloud customers interested in accelerating HPC, data analytics, and deep learning workloads, the long-term strategy appears to moving toward an FPGA approach. If they manage to pull this off, Microsoft could drive a lot more interest in reconfigurable computing from performance-minded users, while simultaneously becoming a technology leader in this area.

Hit: Quantum computing on the cusp

Perhaps the fastest-moving HPC technology of 2017 was quantum computing, which, in a fairly short space of time, grew from an obscure set of research projects into a technology battle between some of the biggest names in the industry. The most visible of these IBM and Google, both of which built increasingly more capable quantum computers over the past 12 months. Currently, IBM currently has a 20-qubit system available for early users, with a 50-qubit prototype waiting in the wings. The company even managed to collect a handful of paying customers for this early hardware. Meanwhile, Google is fiddling with a 22-qubit system, with a 49-qubit machine promised before the end of the year.

In October, Intel has made its own quantum intentions known, with the revelation of a 17-qubit processor. For its part, Microsoft is working on a topological quantum computer, and while it has yet to field a working prototype, the company has come up with a software toolkit for the technology, complete with its own quantum computing programming language (Q#). In a similar vein, Atos/Bull launched a 40-qubit quantum simulator this year, softening the ground for the eventual hardware that everyone expects is right around the corner. 2018 is shaping up to be an even more exciting year for qubit followers.

Miss: Exascale computing fatigue

While exascale projects around the world made a lot of news in 2016, with the different players jockeying for position, this year the news has been a lot more subdued.  Maybe that’s because the various efforts in China, Japan, Europe, and the US are now pretty well set in place, and are just methodically moving forward at their own pace. But with the rise of AI and machine learning, and more generally, data analytics, the artificial milestone of reaching an exaflop on double precision floating point math seems a lot less relevant.

Consider that the DOE’s 200-petaflop Summit supercomputer will deliver three peak exaflops of deep learning performance, and drawing on that capability with large-scale neural networks may dwarf any advances made with the first “true” exascale machines used for traditional modeling. In a Moor Insights and Strategy white paper, senior analyst Karl Freund writes: “It is becoming clear that the next big advances in HPC may not have to wait for exascale-class systems, but are being realized today using Machine Learning methodologies. In fact, the convergence of HPC and [machine learning] could potentially redefine what an exascale system should even look like.”

In a world where machine learning can outperform oncologists, poker-players, and hedge fund analysts, it’s hard to argue with that assessment.