Cray Storms Machine Learning Market with New Accelerated Clusters

Cray has launched two variants of its CS-Storm platform, which are designed principally for the fast-moving AI and data analytics markets. The new systems can be configured with up to 10 GPUs or FPGAs per node, offering some of the most computationally dense accelerator solutions on the market today.

According to Cray, target applications for the systems include “deep learning, machine learning, signal processing, reservoir simulation, geospatial intelligence, portfolio and trading algorithm optimization, pattern recognition and in-memory databases.” The press release summarizes the value proposition thusly:

“The Cray CS-Storm systems provide up to 187 TOPS (tera operations per second) per node, 2,618 TOPS per rack for machine learning application performance, and up to 658 double precision TFLOPS per rack for HPC application performance. Delivered as a fully integrated cluster supercomputer, the Cray CS-Storm systems include the Cray Programming Environment, Cray Sonexion® scale out storage, and full cluster systems management.”

The more conventional system is the CS-Storm 500NX, which is powered by NVIDIA Tesla P100 GPU, the most popular new accelerator for training neural nets. In this case, each node can be outfitted with up to eight P100 SXM2 modules, along with two Intel “Broadwell” Xeon processors. The P100s can talk with each other via NVLink, NVIDIA’s high-performance interconnect for GPU-to-GPU chatter. Each node also supports up to 12 2.5-inch disk drives (up to 4 NVMe drives). All this hardware requires a fairly large enclosure, in this case, a standard 4U rackmount chassis. Multi-node clusters can be built with InfiniBand or Intel Omni-Path.

The main target audience for the 500NX is for customers training large deep learning networks. The use of NVLink to speed GPU-to-GPU communications is the key to this, since these larger networks need multiple GPUs, which demand the fastest inter-processor data rates possible.

The CS-Storm 500GT is a somewhat different animal. It can be outfitted with up to 10 accelerators, in this case, non-NVLink P100 or P40 GPUs or Nallatech FPGA accelerators. Current Nallatech offerings in this area include the 510FT product, a 2.8 teraflop card comprised of two Arria 10 1150 GX FPGAs, and the 385A product, a 1.5 teraflop card with a single Arria 10 1150 GX FPGA.

Each 500GT node will be equipped with two Intel “Skylake” processors, which will be available in the second half of 2017. Up to 16 hot-swap 2.5-inch drives (up to 8 NVMe drives) can be supported. The 500GT node is enclosed in either a 3U or 4U rackmount chassis, depending up the what gets stuffed into it, and like the 500NX, can be clustered via InfiniBand or Intel Omni-Path.

The 500GT is a more flexible and potentially more energy efficient machine, which suggests the target applications are oriented for more scaled-out work: deep learning inferencing (or a mix of training and inferencing), high-end analytics, and even traditional HPC simulations. The support for FPGAs also suggests a wider application palette for the 500GT compared to the 500NX.

Pricing and general availability dates were not disclosed, nor was future support for the recently announced NVIDIA Volta GPUs.

Currently unrated