TOP500 Expands Exaflops Capacity Amidst Low Turnover

FRANKFURT, Germany; BERKELEY, Calif.; and KNOXVILLE, Tenn.—The 56th edition of the TOP500 saw the Japanese Fugaku supercomputer solidify its number one status in a list that reflects a flattening performance growth curve. Although two new systems managed to make it into the top 10, the full list recorded the smallest number of new entries since the project began in 1993.

The entry level to the list moved up to 1.32 petaflops on the High Performance Linpack (HPL) benchmark, a small increase from 1.23 petaflops recorded in the June 2020 rankings. In a similar vein, the aggregate performance of all 500 systems grew from 2.22 exaflops in June to just 2.43 exaflops on the latest list. Likewise, average concurrency per system barely increased at all, growing from 145,363 cores six months ago to 145,465 cores in the current list.

There were, however, a few notable developments in the top 10, including two new systems, as well as a new highwater mark set by the top-ranked Fugaku supercomputer. Thanks to additional hardware, Fugaku grew its HPL performance to 442 petaflops, a modest increase from the 416 petaflops the system achieved when it debuted in June 2020. More significantly, Fugaku increased its performance on the new mixed precision HPC-AI benchmark to 2.0 exaflops, besting its 1.4 exaflops mark recorded six months ago. These represents the first benchmark measurements above one exaflop for any precision on any type of hardware.

Here is a brief rundown of current top 10 systems:

Fugaku remains at the top spot, growing its Arm A64FX capacity from 7,299,072 cores to 7,630,848 cores. The additional hardware enabled its new world record 442 petaflops result on HPL. This puts it three times ahead of the number two system in the list. Fugaku was constructed by Fujitsu and is installed at the RIKEN Center for Computational Science (R-CCS) in Kobe, Japan.
Summit, an IBM-built system at the Oak Ridge National Laboratory (ORNL) in Tennessee, remains the fastest system in the US with a performance of 148.8 petaflops. Summit has 4,356 nodes, each one housing two 22-core Power9 CPUs and six NVIDIA Tesla V100 GPUs.
Sierra, a system at the Lawrence Livermore National Laboratory in California, is ranked third with an HPL mark of 94.6 petaflops. Its architecture is very similar to that of Summit, with each of its 4,320 nodes equipped with two Power9 CPUs and four NVIDIA Tesla V100 GPUs.
Sunway TaihuLight, a system developed by China’s National Research Center of Parallel Computer Engineering & Technology (NRCPC) and installed at the National Supercomputing Center in Wuxi, is listed at number four. It is powered exclusively by Sunway SW26010 processors and achieves 93 petaflops on HPL.
At number five is Selene, an NVIDIA DGX A100 SuperPOD installed in-house at NVIDIA Corp. It was listed as number seven in June but has doubled in size, allowing it to move up the list by two positions. The system is based on AMD EPYC processors with NVIDIA’s new A100 GPUs for acceleration. Selene achieved 63.4 petaflops on HPL as a result of the upgrade.
Tianhe-2A (Milky Way-2A), a system developed by China’s National University of Defense Technology (NUDT) and deployed at the National Supercomputer Center in Guangzho, is ranked 6th. It is powered by Intel Xeon CPUs and NUDT’s Matrix-2000 DSP accelerators and achieves 61.4 petaflops on HPL.
A new supercomputer, known as the JUWELS Booster Module, debuts at number seven on the list. The Atos-built BullSequana machine was recently installed at the Forschungszentrum Jülich (FZJ) in Germany. It is part of a modular system architecture and a second Xeon based JUWELS Module is listed separately on the TOP500 at position 44. These modules are integrated by using the ParTec Modulo Cluster Software Suite. The Booster Module uses AMD EPYC processors with NVIDIA A100 GPUs for acceleration similar to the number five Selene system. Running by itself the JUWELS Booster Module was able to achieve 44.1 HPL petaflops, which makes it the most powerful system in Europe
HPC5, a Dell PowerEdge system installed by the Italian company Eni S.p.A., is ranked 8th. It achieves a performance of 35.5 petaflops using Intel Xeon Gold CPUs and NVIDIA Tesla V100 GPUs. It is the most powerful system in the list used for commercial purposes at a customer site.
Frontera, a Dell C6420 system that was installed at the Texas Advanced Computing Center of the University of Texas last year is now listed at number nine. It achieves 23.5 petaflops using 448,448 of its Intel Platinum Xeon cores.
The second new system at the top of the list is Dammam-7, which is ranked 10th. It is installed at Saudi Aramco in Saudi Arabia and is the second commercial supercomputer in the current top 10. The HPE Cray CS-Storm systems uses Intel Gold Xeon CPUs and NVIDIA Tesla V100 GPUs. It reached 22.4 petaflops on the HPL benchmark.

Other TOP500 highlights

A total of 149 systems on the list are using accelerator/co-processor technology, up from 146 six months ago. 140 of these use NVIDIA chips.

Intel continues to dominate in TOP500 processor share with over 90 percent of systems equipped with Xeon or Xeon Phi chips. Despite the recent rise of alternative processor architectures in high performance computing, AMD processors (including the Hygon chip) represent only 21 systems on the current list, along with ten Power-based systems and just five Arm-based systems. However, the number of systems with AMD-based processors doubled from what it was six months ago.

The breakdown in system interconnects is largely unchanged from recent lists, with Ethernet used in about half the systems (254), InfiniBand in about a third of systems (182), OmniPath in about one-tenth of systems (47), and Myrinet in one system; the remainder use custom interconnects (38) and proprietary networks (6). InfiniBand-connected systems continue to dominate in aggregate capacity with more than an exaflop of performance. Since Fugaku uses the proprietary Tofu D interconnect, the aggregate performance in the six proprietary networks systems (472.9 petaflops) is nearly equal to that of the 254 Ethernet-based systems (477.7 petaflops)

China continues to lead in system share with 212 machines on the list, handily beating out the US at with 113 systems and Japan with 34. However, despite the smaller number of systems, the US continues to lead the list in aggregate performance with 668.7 petaflops to China’s 564.0 petaflops. Thanks mainly to the number one Fugaku system, Japan’s aggregate performance of 593.7 petaflops edges out that of China.

Green500 results

The most energy-efficient system on the Green500 is the new NVIDIA DGX SuperPOD in the US. It achieved 26.2 gigaflops/watt power-efficiency during its 2.4 HPL performance run and is listed at position 170 in the TOP500.

Next on the list is the previous Green500 champ, MN-3. Although it improved its score from 21.1 to 26.0 gigaflops/watt, it slips into the number two position. The system uses the MN-Core chip, an accelerator optimized for matrix arithmetic. It is ranked number 330 in the TOP500.

In the number three Green500 is the Atos-built JUWELS Booster Module installed at Forschungszentrum Jülich (FZJ) in Germany. It achieves 25.0 gigaflops/watt and is ranked seventh in the TOP500.

In fourth position is Spartan-2, another Atos-built machine. It achieves 24.3 gigaflops/watt on HPL and is ranked at position 148 on the TOP500 list.

The fifth-ranked system on the Green500 is Selene, with an efficiency of 24.0 gigaflops/watt. It also occupies the number five spot on the TOP500.

With the exception of the MN-3 system, the remaining top five Green500 systems are using the new NVIDIA A100 GPU as an accelerator. All four of these systems use AMD EPYC as their main CPU.

Of the top 40 systems on the Green500, 37 leverage accelerators, 2 use A64FX vector-processors, and one (TaihuLight) a Sunway many-core processor.

Extrapolating the power efficiency value of 26.2 gigaflops/watt of the NVIDIA DGX SuperPOD out linearly to an exaflop would result in a power consumption of 38 MW (ignoring additional hardware needed for scaling).

HPCG Results

The TOP500 list has incorporated the High-Performance Conjugate Gradient (HPCG) Benchmark results, which provides an alternative metric for assessing supercomputer performance and is meant to complement the HPL measurement.

The list-leading Fugaku expanded its HPCG result with a record 16.0 HPCG-petaflops. The two US Department of Energy systems, Summit at ORNL and Sierra at LLNL, are second and third, respectively, on the HPCG benchmark. Summit achieved 2.93 HPCG-petaflops and Sierra 1.80 HPCG-petaflops. The only other systems to break the petaflops barrier on HPCG are the upgraded Selene system at 1.62 petaflops and the new JUWELS Booster Module at 1.28 petaflops.

HPL-AI Results

The HPL-AI benchmark seeks to highlight the convergence of HPC and artificial intelligence (AI) workloads based on machine learning and deep learning by solving a system of linear equations using novel, mixed-precision algorithms that exploit modern hardware.

The top-ranked system for this benchmark is RIKEN’s Fugaku system, which achieved 2.0 exaflops of mixed precision computation. At number two is ORNL’s Summit supercomputer, which achieved 0.55 exaflops, followed by NVIDIA’s Selene which turned in an HPL-AI result of 0.25 exaflops.

About the TOP500 List

The first version of what became today’s TOP500 list started as an exercise for a small conference in Germany in June 1993. Out of curiosity, the authors decided to revisit the list in November 1993 to see how things had changed. About that time, they realized they might be onto something and decided to continue compiling the list, which is now a much-anticipated, much-watched and much-debated twice-yearly event.

The TOP500 list is compiled by Erich Strohmaier and Horst Simon of Lawrence Berkeley National Laboratory; Jack Dongarra of the University of Tennessee, Knoxville; and Martin Meuer of ISC Group, Germany.

TOP500 Expands Exaflops Capacity Amidst Low Turnover

Current List