By: Michael Feldman
Cray has introduced Shasta, its next-generation supercomputer platform that will serve as the company entry into the realm of exascale computing. The architecture will offer a flexible design that supports a wide array of processors, coprocessors, node configurations, and system interconnects, including one developed by Cray itself.
Flexibility is indeed the key theme of Shasta’s design, so much so that it will enable the company to unify its supercomputers, clusters, and analytics systems under a single product line. Presumably that means there will be no XC, CS and Urika-GX products (or the equivalent) in the Shasta portfolio, although the specific branding of the new platform is still to be determined. Shasta will also support integrated storage servers (hard disk and flash) that can co-mingle with the compute nodes on the same network. Cray has been telegraphing this kind of consolidation for some time, so it shouldn’t come as a big surprise that they made good on this unified-platform vision.
“We listened closely to our customers and dug into the future needs of AI and HPC applications as we designed Shasta,” said Steve Scott, senior vice president and CTO of Cray. “Customers wanted leading-edge, scalable performance, but with lots of flexibility and easy upgradeability over time. I’m happy to say we’ve nailed this with Shasta.”
One area that Shasta customers will certainly have plenty of options is in the area of processors and coprocessors. For HPC more broadly, chip options have expanded significantly over the last few years as a result of the acceptance of Arm as an alternative to x86 silicon, plus the return of AMD CPUs to the HPC space. At the same time, GPU coprocessors for traditional computing and deep learning acceleration have become a mainstream offering on HPC systems, thanks largely to NVIDIA. FPGAs also seem to be poised for a comeback, courtesy of improved FPGA-based SoC offerings from Intel and Xilinx and the more mature software componentry now available for reconfigurable computing.
Shasta will be capable of supporting all these processor types (and possibly even more specialized chips at some point) in a mix-and- match fashion. Supposedly, chips that dissipate as much as 500 watts can be accommodated by the design, which should offer plenty of thermal headroom for anything on the drawing boards at Intel, AMD, NVIDIA, or any Arm chipmaker. (The hottest datacenter GPUs from NVIDIA currently top out at about 300 watts.) Customers will be able to configure the number of processors and coprocessors per node based on their specific application needs, enabling a more customized approach for building these systems.
The promise of chip diversity certainly makes a lot of sense when you consider modern HPC workflows. A common scenario is one in which a customer performs a digital simulation whose results are fed into an analytics engine for data reduction and extraction, and then into a visualization component. Along the way, there might be some machine learning involved to refine parameters on subsequent simulations. To get these different workflow components to run on the processors best suited to them will require some intelligent orchestration of course, but much of the heavy lifting will be provided by system software that knows how to map those applications to the relevant hardware.
This unified-platform approach has some obvious advantages to customers. Instead of needing two or more machines to run different types of workloads, a heterogeneous Shasta system will be able accommodate the entire workflow. That doesn’t mean Cray has removed the complexity of managing the workflow across different kinds of hardware, but it does remove the annoyance of buying and maintaining multiple systems and the performance penalty of shunting data across potentially disparate networks and storage subsystems.
That’s an attractive value proposition, especially if the system software is smart enough to lessen the burden of heterogeneity. This is especially true as supercomputers enter exascale territory and start bumping up against the power budget of even the largest datacenters at national labs, not to mention bumping up against the procurement budgets of those organizations.
One critical aspect in which Shasta breaks new ground, at least as far as Cray machinery goes, is the support of multiple interconnects. The choices include the usual suspects, namely Mellanox InfiniBand and Intel Omni-Path. One can assume that the initial Shasta offering will support the 200 Gbps versions of these technologies: HDR InfiniBand, in the case of Mellanox, and second-generation Omni-Path, in the case of Intel. Both products are due to hit the streets in 2019.
But the real news here is that Cray will offer a custom HPC interconnect by the name of Slingshot. It can be thought of as the follow-on to Aries, a Cray-built technology that the company sold off to Intel in 2012. At the time, it looked like Cray was going to get out of the custom interconnect business for good, figuring that between InfiniBand, Ethernet, and whatever Intel had in mind, would serve the vast majority of the HPC market. But somewhere along the way, Cray decided that the differentiation they had with Aries (and its predecessors) would be key to building the kinds of supercomputers they expect to sell in the exascale era.
According to Steve Scott, Slingshot will offer a raft of features that will support not only traditional HPC, but the adjacent application sets of high performance data analytics and AI. Like its predecessors, Slingshot is based on the dragonfly topology, which Cray originally came up with in 2008. In this latest implementation, the technology enables very large networks to be built, for example, over a quarter of a million endpoints, with just three network hops. (The Aries technology used in the current XC supercomputer line needs five hops.) And unlike a fat-tree setup, only one of those hops requires an expensive optical cable. All of which should cut latency and power usage, reduces cabling costs, and improve reliability.
One key feature of Slingshot is an advanced adaptive routing capability, which Scott says will make it very adept at avoiding congestion, even when the network is under extreme load. In a blog post on the subject, Scott writes that Slingshot will deliver utilization of 90 percent or better (for well-behaved workloads), even for large-scale systems. And since it will also provide performance isolation between workloads, it will be able to maintain high network bandwidth and low latency across multiple applications running simultaneously. Some of this magic happens in hardware, which has the wherewithal to track every packet in the network individually.
Another important feature of the new interconnect is its interoperability with Ethernet, which means Slingshot-infused Shasta machines can be easily hooked up to external storage systems and local area networks elsewhere in the datacenter. And although it’s compatible with standard Ethernet, Slingshot implements a high performance variant of the protocol for internal purposes that is faster and more robust than the standard one.
The first Slingshot switch, codenamed Rosetta, delivers some impressive performance numbers: 6.4 terabits/second (in each direction) per switch, provided by 64 200 Gbps ports. Cray is estimating that they can maintain about 300ns per hop, but more importantly, maintain a tight distribution of latency across various types of network traffic. That’s especially important when you’re dealing with the variety of workloads that are expected to be running on these systems.
Shasta is offered in two rack configurations: a standard datacenter rack and a high-density variant that can hold up to 64 compute blades, each of which can be outfitted with multiple processors. The standard rack comes with air or liquid cooling, while the denser rack will rely exclusively on liquid cooling. In both cases, systems can scale to over 100 cabinets.
Cray is planning to make Shasta commercially available in the fourth quarter of 2019. That year-long lead time should give the company plenty of time to educate potential customers on the system's capabilities and build a sales pipeline. Cray has already booked a big Shasta win, which was announced in conjunction with the unveiling of the platform. In 2020, the National Energy Research Scientific Computing (NERSC) will deploy “Perlmutter,” a Shasta pre-exascale supercomputer previously referred to as NERSC-9. We’ll cover that announcement in more detail in a separate TOP500 News report.
Cray will be talking a lot more about its new platform at the SC18 conference taking place in Dallas, Texas between November 11-16. If you're one of the Shasta-curious, be sure and stop by their booth.