Cavium Notches Another ARM Win in HPC

Nov. 21, 2017

By: Michael Feldman

The US Department of Energy’s (DOE) Argonne National Laboratory announced it will install a ThunderX2-powered cluster built by Hewlett Packard Enterprise (HPE) as part of a collaboration to expand the ARM ecosystem for high performance computing.

As we recently reported, HPE and a number of other vendors have added ThunderX2 servers to their HPC offerings, which suggests that ARM technology has matured enough to make the technology commercially viable. For the time being, Cavium appears to be the ARM vendor-of-choice for high performance computing. ThunderX2, Cavium’s second-generation ARM server SoC doesn’t have quite the floating point muscle as high-end x86 chips, but can deliver better overall memory performance and capacity – at least on paper.

In this case, the lab will use the 32-node system, known as Comanche Wave, as a testbed of sorts for evaluating the suitability of the technology for different types of HPC workloads and for providing a development platform for broadening the ARM software ecosystem. According to Nic Dubé, HPE’s Chief Strategist for HPC and Technical Lead for the Advanced Development Team, the Comanche program represents “one of the largest customer-driven prototyping efforts focused on the enablement of the HPC software stack for ARM.”

The work may also include the development of an open source compiler for ARM, based on LLVM, an open source compiler project that targets some the most popular processor architectures. It just so happens that Argonne is an active contributor to the project, and was instrumental in developing the backends for the PowerPC and IBM Blue Gene/Q platforms. An actively supported LLVM-based compiler would undoubtedly be a valuable addition to the ARM ecosystem.

In addition, Argonne researchers will run various applications on the system and provide feedback to HPE and other partnering vendors on what kind of performance it can deliver. “Argonne is interested in evaluating the ARM ecosystem as a cost-effective and power-effective alternative to x86 architectures based on Intel CPUs, which currently dominate the high-performance computing market,” states the press release.

Like a number of the other DOE labs, Argonne often appraises alternative HPC technologies to look for more performant solutions, as well as to help promote architectural choice. “Inducing competition is a critical part of our mission and our ability to meet our users’ needs,” said Rick Stevens, associate laboratory director for Argonne’s Computing, Environment and Life Sciences Directorate. Which is really a polite way of saying, they don’t want to rely on Intel as the basis for all of their supercomputing hardware. As it stands today, Intel chips are installed in more than 90 percent of all HPC systems.

Also left unsaid here is the DOE CORAL debacle, in which Intel failed to make good on the 180-petaflop Aurora supercomputer contract, which was originally slated for delivery to Argonne next year. That system, which was to be powered by “Knights Hill” Xeon Phi processors, was canceled. Instead, Intel will be allowed to build an exascale system for the lab in 2021 based on yet-to-be-announced x86 processor technology.

In any case, Argonne and the other DOE labs are certainly motivated to lessen their reliance on Intel and, to a lesser extent, the x86 architecture, for the main processing unit. ARM is shaping up to be one of the major alternatives, OpenPower being the other. It remains to be seen if these three processor platforms can coexist with any kind of parity, but for the first time in many years, there is the real possibility of architectural diversity in HPC.