China is developing a new supercomputer designed to be “a prototype of an exascale computer.” Although the country is not expected to the field its first exascale machine until 2020, the prototype is scheduled to boot up before the end of this year. Most likely, the system in question is the infamous Tianhe-2A supercomputer.
The disclosure of the prototype comes courtesy of a Xinhua news report, which quotes Zhang Ting, an application engineer at the National Supercomputer Center in Tianjin. The report doesn’t offer any details on the nature of the upcoming system, but considering that Zhang works for the same organization that hosts the Tianhe (Milkway) supercomputers, it stands to reason he’s talking about Tianhe-2A, the long-overdue sequel to Tianhe-2.
Tianhe-2 is a 55-petaflop (peak) system that captured the number one spot on the TOP500 list in 2013 and currently resides at number two. It relies on Intel’s “Knights Corner” Xeon Phi coprocessors for the majority of those flops, and the original plan was to upgrade the system with the more powerful “Knights Landing” Xeon Phi chips once they became available. The goal was to roughly double the floppage and field the world’s first 100-petaflop-plus supercomputer in 2016.
In the interim though, the US government slapped an export embargo on these Xeon Phi chips for such purposes. The Tianhe line is developed by the National University of Defense Technology (NUDT) and the US suspects that its systems are being used to research and develop nuclear weaponry.
All of which led to the acceleration of China’s indigenous chip-making efforts. The most visible payoff of those efforts is TaihuLight, the current top-ranked system on the TOP500. TaihuLight is a 125-petaflop (peak) supercomputer that is powered exclusively by domestically produced ShenWei 26010 processors.
Tianhe-2A will more than likely not use ShenWei processors. A presentation at ISC High Performance 2015 from NUDT professor Yutong Lu revealed that next Tianhe system will use general-purpose DSP coprocessors, developed in China, to provided much of its compute power. The coprocessor, known as the Matrix2000 GPDSP, is projected to deliver about 2.4 teraflops of double precision performance (4.8 teraflops, single precision) per chip. It will also include support of high bandwidth memory of some sort, although the particulars weren’t specified. That would put it roughly in the same league, performance-wise, as an Intel Knights Landing processor.
Image: Tiahne-2A status July 2015, Dr. Yutong Lu, National University of Defense Technology
However, unlike the Knights Landing device, which can act as a standalone processor, the Matrix2000 GPDSP is strictly a PCIe-based coprocessor, requiring a host CPU to drive it. According to Yutong’s ISC 2015 presentation, the Tianhe-2A will retain the use of the original Intel Xeon E5-2692 CPUs to act in this role.
Or maybe not. A July 2016 tweet by James Lin, Vice Director of the HPC Center at Shanghai Jiao Tong University, claimed that the NUDT pre-exascale system to be hosted at the National Supercomputing Center of Tianjin would employ an ARM processor. And there just so happens to be a 64-bit ARM implementation under development in China: the Phytium. The latest iteration of this design is the Phytium FT2000/64 processor, which we reported on back in August when it was announced at Hot Chips 2016.
In a nutshell, the FT2000/64 is a 64-core ARM CPU that is purported to deliver 512 gigaflops running full tilt. Conveniently, the chip provides a couple of PCIe x16 interfaces if one needed to, for example, hook up a high-performance accelerator or two. In fact, a dual-socket FT2000/64 server equipped with two Matrix2000 GPDSP would provide the 6 teraflops of performance specified for a Tianhe-2A node.
In the US, all the known pre-exascale system, exemplified by the three Department of Energy’s CORAL supercomputers, will be powered by x86, Power, and GPU silicon. The Summit and Sierra systems will be equipped with IBM Power9 CPUs and NVIDIA Volta GPUs, while the Aurora system will be based on the next generation Intel Xeon Phi processors. Summit and Sierra are scheduled to be installed before the end of the year, while Aurora is slated for deployment in 2018. All are expected to top 100 petaflops. In fact, Summit could reach as high as 300 petaflops and Aurora could hit 450 petaflops.
That may motivate the Chinese to transform Tianhe-2A into a much larger machine than originally planned, especially given the propensity of the government there to retain its number one status on the TOP500 list. (The last time the top-ranked system was not a Chinese supercomputer was 2012.) If so, the next 12 months of top tier deployments should prove to be an interesting competition, not just to see who can eke out the most petaflops, but which technologies are going to shape the next era of supercomputing.