Highlights - June 2021

This is the 57th edition of the TOP500.

The only new entry in the Top10 is the Perlmutter system at NERSC at the DOE Lawrence Berkeley National Laboratory. It is based on the HPE Cray “Shasta” platform and a heterogeneous system with both GPU-accelerated and CPU-only nodes. Perlmutter achieved 64.6 Pflop/s which put it at No. 5 in the new list.

Supercomputer Fugaku, a system based on Fujitsu’s custom ARM A64FX processor remains No. 1. It is installed at the RIKEN Center for Computational Science (R-CCS) in Kobe, Japan, the location of the former K-Computer. It was co-developed in close partnership by Riken and Fujitsu and uses Fujitsu’s Tofu D interconnect to transfer data between nodes. Its HPL benchmark score to 442 Pflop/s easily exceeding the No. 2 Summit by 3x. In single or further reduced precision, which are often used in machine learning and AI applications, it’s peak performance is actually above 1,000 PFlop/s (= 1 Exaflop/s) and because of this, it is often introduced as the first ‘Exascale’ supercomputer. Fugaku actually already demonstrated this new level of performance on the new HPL-AI benchmark with 2 Exaflops! https://www.r-ccs.riken.jp/en/

Here a brief summary of the system in the Top10:

  • Fugaku remains the No. 1 system. It has 7,630,848 cores which allowed it to attain an HPL benchmark score of 442 Pflop/s. This puts it by 3x ahead of the No. 2 system in the list.

  • Summit, an IBM-built system at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA, remains the fastest system in the U.S. at the No. 2 spot worldwide with a performance of 148.8 Pflop/s on the HPL benchmark, which is used to rank the TOP500 list. Summit has 4,356 nodes, each one housing two Power9 CPUs with 22 cores each and six NVIDIA Tesla V100 GPUs each with 80 streaming multiprocessors (SM). The nodes are linked together with a Mellanox dual-rail EDR InfiniBand network.

  • Sierra, a system at the Lawrence Livermore National Laboratory, CA, USA is at No. 3. It’s architecture is very similar to the new #2 systems Summit. It is built with 4,320 nodes with two Power9 CPUs and four NVIDIA Tesla V100 GPUs. Sierra achieved 94.6 Pflop/s.

  • Sunway TaihuLight, a system developed by China’s National Research Center of Parallel Computer Engineering & Technology (NRCPC) and installed at the National Supercomputing Center in Wuxi, which is in China's Jiangsu province is listed at the No. 4 position with 93 Pflop/s.

  • Perlmutter at No. 5 is new in the TOP10. It is based on the HPE Cray “Shasta” platform and a heterogeneous system with AMD EPYC based nodes and 1536 NVIDIA A100 accelerated nodes. Perlmutter achieved 64.6 Pflop/s.

Rank Site System Cores Rmax (TFlop/s) Rpeak (TFlop/s) Power (kW)
1 RIKEN Center for Computational Science
Supercomputer Fugaku - Supercomputer Fugaku, A64FX 48C 2.2GHz, Tofu interconnect D
7,630,848 442.01 537.21 29,899
2 DOE/SC/Oak Ridge National Laboratory
United States
Summit - IBM Power System AC922, IBM POWER9 22C 3.07GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband
2,414,592 148.60 200.79 10,096
United States
Sierra - IBM Power System AC922, IBM POWER9 22C 3.1GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband
IBM / NVIDIA / Mellanox
1,572,480 94.64 125.71 7,438
4 National Supercomputing Center in Wuxi
Sunway TaihuLight - Sunway MPP, Sunway SW26010 260C 1.45GHz, Sunway
10,649,600 93.01 125.44 15,371
United States
Perlmutter - HPE Cray EX235n, AMD EPYC 7763 64C 2.45GHz, NVIDIA A100 SXM4 40 GB, Slingshot-10
706,304 64.59 89.79 2,528
6 NVIDIA Corporation
United States
Selene - NVIDIA DGX A100, AMD EPYC 7742 64C 2.25GHz, NVIDIA A100, Mellanox HDR Infiniband
555,520 63.46 79.22 2,646
7 National Super Computer Center in Guangzhou
Tianhe-2A - TH-IVB-FEP Cluster, Intel Xeon E5-2692v2 12C 2.2GHz, TH Express-2, Matrix-2000
4,981,760 61.44 100.68 18,482
8 Forschungszentrum Juelich (FZJ)
JUWELS Booster Module - Bull Sequana XH2000 , AMD EPYC 7402 24C 2.8GHz, NVIDIA A100, Mellanox HDR InfiniBand/ParTec ParaStation ClusterSuite
449,280 44.12 70.98 1,764
9 Eni S.p.A.
HPC5 - PowerEdge C4140, Xeon Gold 6252 24C 2.1GHz, NVIDIA Tesla V100, Mellanox HDR Infiniband
669,760 35.45 51.72 2,252
10 Texas Advanced Computing Center/Univ. of Texas
United States
Frontera - Dell C6420, Xeon Platinum 8280 28C 2.7GHz, Mellanox InfiniBand HDR
448,448 23.52 38.75
  • Selene now at No. 6 is an NVIDIA DGX A100 SuperPOD installed inhouse at NVIDIA in the USA. The system is based on AMD EPYC processor with NVIDIA A100 for acceleration and a Mellanox HDR InfiniBand as network and achieved 63.4 Pflop/s.

  • Tianhe-2A (Milky Way-2A), a system developed by China’s National University of Defense Technology (NUDT) and deployed at the National Supercomputer Center in Guangzho, China is now listed as the No. 7 system with 61.4 Pflop/s.

  • A system called “JUWELS Booster Module” is the No. 8. The BullSequana system build by Atos is installed at the Forschungszentrum Juelich (FZJ) in Germany. The system uses AMD EPYC processor with NVIDIA A100 for acceleration and a Mellanox HDR InfiniBand as network similar to the Selene System. This system is the most powerful system in Europe with 44.1 Pflop/s.

  • HPC5 at No. 9 is a PowerEdge system build by Dell installed by the Italien company Eni S.p.A.. It achieves a performance of 35.5 Pflop/s due to using NVIDIA Tesla V100 as accelerators and a Mellanox HDR InfiniBand as network.

  • Frontera, a Dell C6420 system is installed at the Texas Advanced Computing Center of the University of Texas and is now listed at No. 10. It achieved 23.5 Pflop/s using 448,448 of its Intel Xeon cores.

Highlights from the List

  • A total of 147 systems on the list are using accelerator/co-processor technology, up from 147 six months ago. 26 of these use NVIDIA Ampere chips, 97 use NVIDIA Volta, and 0 systems with 18.

  • Intel continues to provide the processors for the largest share (86.40 percent) of TOP500 systems, down from 91.80 % six months ago. 48 (9.60 %) of the systems in the current list used AMD processors, up from 4.20 % six months ago.

  • Supercomputer Fugaku maintains the leadership followed by the 2 top DOE systems Sierra and Summit in the #2 and #3 spots with respect to HPCG performance.

  • The entry level to the list moved up to the 1.51 Pflop/s mark on the Linpack benchmark.

  • The last system on the newest list was listed at position 452 in the previous TOP500.

  • Total combined performance of all 500 exceeded the Exaflop barrier with now 2.79 exaflop/s (Eflop/s) up from 2.43 exaflop/s (Eflop/s) 6 months ago.

  • The entry point for the TOP100 increased to 4.12 Pflop/s.

  • The average concurrency level in the TOP500 is 153,852 cores per system up from 144,932 six months ago.

General Trends

Installations by countries/regions:

HPC manufacturer:

Interconnect Technologies:

Processor Technologies:


HPCG Results

About the TOP500 List

The first version of what became today’s TOP500 list started as an exercise for a small conference in Germany in June 1993. Out of curiosity, the authors decided to revisit the list in November 1993 to see how things had changed. About that time they realized they might be onto something and decided to continue compiling the list, which is now a much-anticipated, much-watched and much-debated twice-yearly event.