Cray has announced it is offering an ARM blade option on its XC50 supercomputer, which will be based on Cavium’s second-generation ThunderX2 processor.
The announcement comes almost exactly two years after Cray first announced it was collaborating with Cavium to explore the use of ARM for HPC and data analytics work. At the time, the collaboration was focused on Cavium’s first-generation 64-bit ARM product, the 48-core ThunderX. Some of the initial funding for this work came from FastForward 2, a US Department of Energy (DOE) R&D program that was designed to accelerate potential exascale technologies.
No commercial Cray systems based on the ThunderX chips ever emerged, but with the more powerful ThunderX2 chips, the supercomputer-maker determined there would be enough interest from HPC users to warrant support for the platform. “We’re seeing signals within the broader market that there is interest in and around ARM,” said Fred Kohout, Cray’s Chief Marketing Officer and VP of Products.
This was a substantial investment for Cray, since it had to build an XC50 system software stack to support the ThunderX2 SoC, including ARMv8 compilers and other development tools, along with runtime libraries in math, science, and communications. Cray claims its ARM compiler demonstrated better performance in two-thirds of 135 benchmarks, and much better performance – 20 percent or more – in one-third of them, compared to open source ARM compilers from LLVM and GNU.
The Cray ThunderX2 blades can be mixed with other XC50 blades outfitted with Intel Xeon-SP or Xeon Phi processors and NVIDIA Tesla GPUs. Both air-cooled and liquid-cooled options are available.
Cray already has one customer lined up for the ThunderX2-powered XC50: the Great Western 4 (GW4) Alliance, a research consortium of four UK universities (Bristol, Bath, Cardiff and Exeter). In January 2017, the alliance announced it had contracted Cray to build "Isambard," a 10,000-core ARM-based supercomputer, which will provide a Tier 2 HPC service. The UK’s Met Office was also involved on the deal, since it was interested in seeing how its weather and climate codes would run on such a machine. The system will be paid for out of a £3 million award from the Engineering and Physical Sciences Research Council (EPSRC). It’s scheduled to be fully deployed by the end of this year.
At the time of the GW4 announcement, Cray had not revealed if the machine was going to be a commercial product or a custom design. And, in fact, the initial hardware used the CS400 cluster platform for early development. The type of ARM processor to be used in the system was also not revealed.
In talking with Fred Kohout, Cray’s Chief Marketing Officer and VP of products, there are also DOE customers interested in ThunderX2-powered XC50 – not surprising considering that the agency used some of its FastForward 2 money to fund Cray’s initial work in this area. Although most of the main processing units in pre-exascale and exascale machines in the US are likely to be based on either x86 or Power, ARM gives the DOE a third processor architecture for building next-generation supercomputers.
Beyond some of these earliest adopters, Kohout thinks there are other potential users for ARM-based supercomputers, although he didn’t offer who he had in mind. “I think it’s safe to say we see enough interest in the technology besides some specific customers,” he said.
Of course, if the market develops to any extent, Cray will have to deal with some competition. Atos (Bull) has already committed to an ARM-powered supercomputer based on these same Cavium processors and the Bull sequana platform. Atos’s initial customer is the Mont-Blanc Project, which will shell out 7.9 million euros to build an exascale prototype based on the ThunderX2-based sequana. Lenovo, Dell, Penguin Computing and a few others have also built ARM-based HPC servers for this market, and are all basically in wait-and-see mode to see how the market shapes up.
Cray’s ThunderX2 XC50 blades will be generally available in the second quarter of 2018.