By: Michael Feldman
A number of established trends in HPC continued to gather momentum in 2018. But there were also a few surprises. Let’s take a closer look at the more notable developments.
Hit: Exascale Programs Gather Speed
With the first exascale supercomputers slated to come into production in the next two to five years, all the major efforts in the various geographies reported significant progress in 2018. To start with, China got all three of its exascale platforms in gear by installing prototype machines, the details of which were partially revealed in November at SC18. Meanwhile, Fujitsu unveiled the A64FX, the Arm chip that is destined to power the Japan’s first exascale system. In July, the European Processor Initiative (EPI) got underway, with its EU mandate to develop two domestically-produced processors for its exascale and pre-exascale supercomputers. In April, the US took a big step forward toward its upcoming exascale machinery with a Department of Energy RFP that would put systems at Oak Ridge, Argonne, and Lawrence Livermore national labs. And in September, Hyperion Research provided a nice summary of where the various exascale efforts stood and how much is being spent by the different players.
Hit and Miss: Intel Lays Out a Somewhat Confusing Set of Roadmaps
Intel spelled out its plans for its future datacenter processors, including its mainstream Xeon CPUs and its AI chip portfolio. (At Intel though, every chip does AI to one extent or another.)
In May at the company’s first AI DevCon event, AI Products Group chief Naveen Rao outlined their multi-pronged approach to developing artificial intelligence platforms. If you’d rather not delve into our coverage of the event, here’s the Readers Digest version: Xeon CPUs for generic inferencing work and even some training; FPGAs for more specialized inferencing cases; the Neural Network Processor (NNP) for the most demanding training scenarios. By the way, the precursor to NNP is Lake Crest, which has been relegated to a test chip for software development only. Knights Mill, the Xeon Phi variant aimed at machine learning was never mentioned. Oh, and Intel is also working on a discrete inferencing accelerator.
The latest Xeon roadmap is a bit more straightforward, although it also takes a few interesting turns. Basically, the next three Xeon processors, Cascade Lake, Cooper Lake and Ice Lake, will be rolling out of Intel fabs in quick succession. Cascade Lake, a 14nm CPU, will incorporate some Spectre and Meltdown fixes, a new set of deep learning instructions, and support of Optane memory. It was supposed to start shipping in the fourth quarter of the year, but apparently that has slipped into Q1 of 2019. Cooper Lake, also on 14nm, will include more support for deep learning in the form of support for the bfloat16 format. It’s supposed to debut in 2019 as well. The last one, Ice Lake, will be manufactured with Intel’s much-delayed 10nm process, and is scheduled for delivery in 2020.
More specific to HPC setups is a new line of Xeon Advanced Performance (AP) processors with better floating point performance and a lot more memory bandwidth. The first of these will be Cascade Lake AP, which will be equipped with 48 cores and 12 DDR4 memory channels. The AP variant is supposed to be released at the same time as its mainstream sibling, Cascade Lake SP, in other words, early 2019. Although Intel hasn’t publicly revealed a second-generation AP product, it was mentioned in a leaked roadmap that found its way into the press in July.
Hit: AMD Looks to Capitalize on Intel Missteps
AMD built a good deal of credibility with its original EPYC CPU that was introduced last year. The processor is even working its way into a handful of HPC clusters, with Cray, HPE, Dell and other vendors now supporting the chip. But it’s the second-generation EPYC, code-named as Rome, that is poised to make a much bigger splash in high performance computing.
Although Rome won’t be released until next year, AMD is already building its case that this new 64-core processor will be able to outrun Intel’s best and brightest. Unlike the original EPYC, Rome is aimed more directly at performance-minded customers, promising four times the flops per socket as its predecessor and up to 400 GB/second of memory bandwidth. That may be fast enough to outrun Intel’s most performant offering in 2019, the aforementioned 48-core Cascade Lake AP processor.
AMD’s advantage might come down to the fact that Rome will be built on TSMC’s 7nm process technology, while Intel will be stuck on its 14nm technology in 2019. With Intel’s 10nm plans in disarray, for the first time AMD has an opportunity to exploit a fundamental advantage in building chips with smaller transistors than its primary rival. If that comes to pass, it will force Intel to play catch up, something that hasn’t occurred since AMD leap-frogged Intel with its Opteron processors more than a decade ago.
Hit: NVIDIA Consolidates Its Dominance as the Go-To Accelerator
This year NVIDIA saw its AI-loving V100 GPU become the key component in some of the most powerful supercomputers on the planet. The most visible of these are the US Department of Energy’s Summit and Sierra machines, which are now the top two most powerful systems in the world according to the November TOP500 rankings. Summit has the top spot with a Linpack mark of 143.5 petaflops, with Sierra was close behind at 94.6 petaflops. Although these systems are outfitted with IBM Power9 CPUs as the host processor, the V100 accelerators supply the vast majority of the flops in these machines.
Even though both supercomputers will be primarily used to run conventional HPC simulations, it’s telling that shortly after Summit was switched on, its keepers announced the system had executed the world’s first “exascale” application using the reduced-precision capabilities of the V100’s Tensor Cores. These same Tensor Cores are expected to get plenty of exercise running more traditional machine learning workloads on both Summit and Sierra.
V100 hardware was the basis for Japan’s new AI Bridging Cloud Infrastructure (ABCI) system, which is currently the seventh most powerful supercomputer in the world, Linpack-wise. Unlike Summit and Sierra, it was purposely designed to run machine learning applications, although it too will host more traditional HPC.
The V100 also provided the computational muscle behind the DGX-2, NVIDIA’s second-generation “AI supercomputer” that delivers two petaflops of Tensor Core number crunching. It boasts 16 V100 GPUs glued together by NVIDIA’s custom NVSwitch fabric. Although primarily aimed at AI research shops, four DOE labs installed DGX-2 systems this year, mainly to do machine learning work, but in some cases to mix scientific simulations with neural network processing.
If that weren’t enough, the V100 added to its cloud presence in 2018, finding its way into Azure, the Google Cloud, the Oracle Cloud, and IBM’s bare metal cloud offering in 2018. This followed previous V100 deployments in 2017 by Amazon Web Services and the big Chinese cloud providers.
Hit: A Breakthrough Year for Arm
For Arm enthusiasts in the HPC community, 2018 will be seen as a watershed year. In the latter half of 2018, Sandia National Laboratories installed Astra, the world’s first petascale supercomputer powered by Arm microprocessors. In the November TOP500 list it was ranked at number 204, turning in a Linpack result of 1.5 petaflops.
A smaller Arm-powered supercomputer, Isambard, was also installed in 2018, in this case at the University of Bristol. A similar machine is headed to the French Atomic Energy Commission (CEA) and was scheduled for completion before the end of the year.
More encouraging is the fact that you can now get an Arm-powered super from a variety of system venders. Astra was supplied by HPE, Isambard is a Cray machine, and the CEA system comes from Atos. The Arm processor behind all these systems is the ThunderX2 chip, a technology now owned by Marvell.
Miss: Silicon Security
The year started off with a bad omen with the revelation of the Spectre and Meltdown vulnerabilities that were found to be widespread across most CPU platforms. Briefly, it was revealed that commonly used processor features -- speculative execution behavior, in the case of Spectre, and the treatment of race conditions, in the case of Meltdown -- could make every system using these CPUs vulnerable to security breaches.
Chip vendors, as well as cloud companies, OS providers, and web browser developers raced to plug the vulnerabilities in the form of software and firmware fixes. Some of those fixes inevitably impacted performance, which HPC users founded particularly troubling. And despite the quick response, everyone ended up feeling a little queasy about what other bugs might be lurking on their silicon
Hit: Quantum Leaps
2018 saw its share of “breakthroughs” in quantum computing realm this year, with numerous announcements of hardware and software advancements. Looking at the big players, Intel revealed a 49-qubit chip known as Tangle Lake, Google announced it had constructed a 72-qubit processor in its bid to achieve quantum supremacy, Fujitsu introduced a quantum computing service based on its own quantum annealing processing, and IBM continued to gather users to its quantum cloud service. In the startup arena, Rigetti kicked off its cloud service based on its own 16-qubit and 32-qubit processors,
Just this month, IonQ announced it had developed a device with at least 79 usable qubits. Unlike all the other solutions mentioned here, IonQ’s qubits are based on trapped ions, which the company says offers significant advantages in stability compared to competing technologies.
We’ll give honorable mentions to neuromorphic computing, optical computing, and cloud computing, all of which would have would have showed up in the Hits column this year. And of course, AI/machine learning technology continued to work its way into HPC centers large and small in 2018. There are just too many examples to mention.