Japanese Universities Order 25 Petaflop Supercomputer from Fujitsu

By: Michael Feldman

Michael Feldman, Managing Editor

Fujitsu is building what will be Japan’s floppiest supercomputer to date. On Tuesday, the company announced it received an order from the University of Tokyo and the University of Tsukuba for a 25 petaflop system, which will be used for scientific research and engineering.  The supercomputer will be housed at the Joint Center for Advanced High-Performance Computing (JCAHPC) and is scheduled to be operational in December 2016.

The new machine, known as Oakforest-PACS, will be made up of 8,208 PRIMERGY nodes equipped with Intel Xeon Phi processors (“Knights Landing”). Employing some simple arithmetic reveals that each node contains a single Phi chip with no accompanying Xeon CPU host. Unlike GPU accelerators, the new Knights Landing chips can operate as standalone processors, avoiding the complication of the host-coprocessor model. (In fact, the only Xeon CPUs in the system are in 51 additional PRIMERGY units, including 20 login nodes, devoted to non-computational functions.) Aggregate memory capacity is listed at 900 TB, which works out to about 100 GB per node. It’s not clear if that includes any “near” memory, the 3D high bandwidth memory that is integrated in the Xeon Phi package.

The computational nodes are hooked together with the new 100 Gbps Omni-Path interconnect, Intel’s answer to EDR InfiniBand. Omni-Path, an architecture derived from QLogic’s TrueScale InfiniBand and Cray’s Aries interconnects, is the technology Intel intends to carry forward into exascale designs. It matches EDR InfiniBand speeds, bit for bit, and offers a number of advanced features that promises to improve application performance.

Oakforest-PACS will be built from 2U chassis building blocks, each containing eight nodes (so more than 24 double precision teraflops per box). Given the density of the system, Fujitsu is using hot water cooling technology to disperse the heat load. Such cooling technology is quickly becoming standard fare on supercomputers these days as the computational density of servers ratchets up.

The system will also include a 26 PB Lustre disk-based file system, accelerated with a 940 TB file cache employing SSDs. The storage will be supplied by Data Direct Networks (DDN).

When it goes online, Oakforest-PACS will certainly be a top 10 supercomputer based on the TOP500 Linpack rankings.  If it were operational today, it would probably be the number 2 system, behind the 55-petaflop (peak) Tianhe-2 world champ, but by December there is likely to be some additional competition at the top of the list.  Compared to Japan’s current leader, the K computer used at RIKEN’s Advanced Institute for Computational Science, the Oakforest-PACS system will be more than twice as performant from a pure FLOPS standpoint.

Oakforest-PACS has been on the drawing board since 2014, and was originally scheduled to be deployed in 2015. It is described as a “Post T2K System,” which is a supercomputer specification based on Intel’s Xeon Phi architecture. The original T2K spec was developed for pre-petascale machinery and was based on a four-socket quad-core AMD Opteron node. But of course life went on and the standard evolved into a manycore design. Although T2K was intended to be an “open” specification, it was developed and adopted by just three institutions: the University of Tokyo, the University of Tsukuba, and Kyoto University.

As one might imagine from its academic buyers, Oakforest-PACS will be devoted to research work, including computational science and engineering, and will be available to nationwide Japanese researchers and their international collaborators. The system will also be used as a training and education resource for computer science students, especially those interested in high performance computing.

Although this particular system is all Intel, the universities have not put all their supercomputing eggs into the same architectural basket. The University of Tokyo, for example, operates three other supercomputers: a Power7-based Hitachi system (54.9-teraflops), and two SPARC64 IXfx-based Fujitsu PRIMEHPC FX10 supercomputers (1.1 petaflops and 136.2 teraflops). The University of Tsukuba operates three Cray systems, one with Xeon and Xeon Phi processors, and two others with Xeons and NVIDIA GPUs.

Oakforest-PACS is intended to operate for more than five years, until at least 2022, at which point it will be replaced by a 100-plus petaflop system. The University of Tokyo, will be deploying a number of its own supercomputers over the next decade. Starting July 2016, the university will begin installing a Xeon-based SGI 540-node cluster, 120 of which will be equipped with NVIDIA’s latest P100 GPUs. That system is expected to deliver between 1.81 and 1.93 petaflops and will boot up in March 2017. In the middle of 2018, the university will deploy a 50-plus petaflop supercomputer, which in 2024 will be replaced by a machine that will top out at more than 200 petaflops. Of course by then it will presumably be operating in the shadow of the first exascale machines running in Japan, China, Europe, and the US.