AMD Demos Petaflop-in-a-Rack Supercomputer

Aug. 1, 2017

By: Michael Feldman

AMD has demonstrated a supercomputer based on its latest AMD EPYC CPUs and Radeon Instinct GPUs that can deliver one petaflop of single precision floating point performance in a single rack.

The demo was presented at AMD’s Capsaicin event, which took place in conjunction with the SIGGRAPH conference, an annual gettogether for the content creation community.



AMD CEO Lisa Su introduced the system, dubbed Project 47, as a platform suitable for both deep learning workloads and image rendering applications. Su said the machine is powered by 20 EPYC CPUs and 80 Radeon Instinct GPUs, and contains 10 TB of DDR4 memory.

Project 47 is a joint collaboration between AMD, Inventec, Samsung, and Mellanox. Inventec, a Taiwan-based original design manufacturer (ODM), built and integrated the system, based on its P-series 2U server platform. Samsung supplied the high bandwidth memory (HBM2) memory for the Radeon Instinct cards, as well as DDR4 main memory modules and NVMe SSD storage devices, while Mellanox contributed its EDR (100 Gbps) adapters, cabling, and switches to hook together the 20 servers.

In this case, each server contains a single EPYC 7601 CPU hooked to four Radeon Instinct MI25 GPUs. The advantage of the single-socket EPYC server is that it is able to support plenty of memory, GPU cards, and other PCIe devices, without have to resort to a second CPU installed solely to hook in more componentry. That saves not only the extra expense of the CPU, but power as well -- something AMD has touted as one of the major advantages of its EPYC design.

As is always the case for such accelerator-based architectures, the majority of the flops are supplied by the GPUs. In this case, each MI25 delivers 12.3 teraflops of single precision floating point (FP32) or 24.6 teraflops of half precision (FP16).  Together they account for more than 95 percent of the system’s floating point computational power.

The new Radeon Instinct products were designed primarily for deep learning work, but the demonstration played to the local SIGGRAPH crowd, running a variety of image rendering applications. At this point, Su handed the presentation off to Raja Koduri, senior vice president and chief architect of the AMD Radeon Technologies Group.

In the first demo, the rack was used as a virtualized resource for four different rendering applications launched from thin clients. Koduri explained that the Radeon GPUs included hardware virtualization support for the kind of remote execution that they were using, noting that each GPU can support up to 16 users. That meant the rack could theoretically deal with 1280 users simultaneously. The second demo used all 80 GPUs for a single graphics application, in this case, to illustrate how photorealistic rendering could be accomplished in real time. “We’re redefining high performance computing for the content creation community,” said Koduri.

Although the SIGGRAPH demonstration only illustrated the system’s graphic prowess, AMD is hoping the machine will attract AI users looking to run their deep learning codes. Theoretically, the system could also be used for more traditional HPC work for applications that can tolerate 32 bits of precision.

One of the main advantages of the Project 47 machine is that it is able to deliver a lot of floating point horsepower within a relatively small power envelope. AMD is claiming the system delivers 30 gigaflops per watt of FP32 operations, which would put it at or near the top of the Green500 list if somehow those FP32 operations could be transformed to FP64. Alas, these latest Radeon parts have little 64-bit capability, making the comparison somewhat irrelevant. The current Green500 champ is TSUBAME 3.0, which turned in a power efficiency of 14.1 gigaflops of performance based on (FP64) Linpack.

Project 47 systems are expected to be available from Inventec and their principal distributor, AMAX, in Q4 of this year. Pricing was not announced.