Fujitsu Boosts RIKEN AI Supercomputer to 54 Petaflops

April 23, 2018

By: Michael Feldman

Fujitsu has performed a massive upgrade to RIKEN’s RAIDEN supercomputer using NVIDIA DGX-1 servers outfitted with the latest V100 Tesla GPUs.

RAIDEN, which stands for Riken AIp Deep learning ENvironment, is the flagship supercomputer for the RIKEN Center for Advanced Intelligence Project, the organization’s research arm that specializes in AI. According to the announcement, the upgrade was undertaken to meet the increased computational volume that researchers need to handle more complex modeling of neural networks and the increasing volumes of deep learning training data.

RAIDEN was originally deployed in 2017 using 24 of NVIDIA’s first-generation DGX-1 servers, each of which were powered by eight P100 GPUs.  Together, set of servers delivered four half-precision petaflops for deep learning applications. With the upgrade, those original servers were replaced with 54 DGX-1 boxes, using the newest V100 GPUs. Since the V100 has the special Tensor Core circuitry specifically designed for neural network processing (125 teraflops of mixed precision floating point operations per device), the upgraded system will offer a whopping 54 petaflops of deep learning performance.

Fujitsu also added 64 Fujitsu Server PRIMERGY CX2550 M4 servers plus one PRIMERGY RX4770 M4 unit to the original 32 PRIMERGY RX2530 servers in the 2017 machine. All are x86-only boxes and are there to perform general-purpose computing.

The enhanced RAIDEN machine represents one of the first major installations of the V100s – probably the largest such deployment to date for a supercomputer.  That distinction will likely be short-lived. Japan’s AI Bridging Cloud Infrastructure (ABCI) supercomputer, which will also employ NVIDIA V100 GPUs, is expected to deliver 550 deep learning petaflops when it comes online later this year. The ABCI machine will be operated by Japan’s National Institute of Advanced Industrial Science and Technology (AIST).

However, even the ABCI system will be dwarfed by the deep learning capacity of Summit and Sierra, two Department of Energy (DOE) supercomputers scheduled to come online later this year.  Summit is expected to deliver about three exaflops of deep learning performance, while Sierra will provide something on the order of 1.8 exaflops. Unlike the RAIDEN system, the V100 GPUs in the DOE machines will be expected to accelerate both deep learning applications and more mainstream HPC workloads.

Image source: Fujitsu