The latest rankings of supercomputers based on the High Performance Conjugate Gradients (HPCG) benchmark were released last week at SC16. The K computer, installed at RIKEN in Japan, captured the number one spot on the HPCG rankings, with a mark of about 603 teraflops.
The number two HPCG system is Tianhe-2 (580 teraflops), followed by Oakforest-PACS (386 teraflops), TaihuLight (371 teraflops), and then Cori (356 teraflops). All of these, including the K computer, are top 10 systems on the latest TOP500 list, although their ordering here is rather different since HPCG stresses parts of the system that HPL does not.
That’s due to the fact that Linpack is a rather simple measure of floating point performance, based on calculations using linear algebra. HPCG, on the other hand, uses a number of different computationally-intensive algorithms. It incorporates calculations in sparse matrix multiplication, global collectives, and vector updates, which are said to more closely represent the mix of operations in many supercomputing codes. Overall, it's a much tougher metric than HPL; no system reached a single petaflop running HPCG.
That's mainly due to the fact that HPCG exercises data movement to a much that greater extent than HPL, and is therefore much more challenging to the memory subsystem. And given that memory is often the bottleneck on modern supercomputers, HPCG can be much more indicative of real application performance. If you glance through the HPCG rankings, it’s immediately apparent the large difference between HPL and HPCG performance.
Using peak performance as a base, most TOP500 systems demonstrate HPL efficiencies of between 50 and 90 percent. By contrast, most systems that ran HPCG yield between 1 and 10 percent of peak. The K computer achieved 5.3 percent efficiency, but the top HPL system, TaihuLight, managed just 0.3 percent efficiency with HPCG.
The most efficient systems on the list were NEC vector supercomputers, specifically the NEC SX-ACE machines, which all delivered between 10 and 12 percent of peak performance using HPCG. The SX architecture is well known for its superior memory performance, so the better efficiencies on the NEC machines makes sense. But even with this platform, there are a lot of peak flops being left on the table.
HPCG seems to be gaining appeal by users as metric for supercomputer performance. On the latest list, 101 submissions were recorded. In November 2015, there were just 64 submissions, while in 2014, the year HPCG was conceived, there were 25 submissions. According to its developers, it’s intent is not to replace the more popular HPL as a defining benchmark, but to provide a complementary metric. If it can build its base over time as HPL has done, it will provide a richer perspective on supercomputing capability and perhaps even encourage system designers create more balanced machines.