June 2021

The 57 th edition of the TOP500 saw little change in the Top10. The only new entry in the Top10 is the Perlmutter system at NERSC at the DOE Lawrence Berkeley National Laboratory. The machine is based on the HPE Cray "Shasta" platform and a heterogeneous system with both GPU-accelerated and CPU-only nodes. Perlmutter achieved 64.6 Pflop/s, putting the supercomputer at No. 5 in the new list.

The Japanese supercomputer Fugaku held onto the top spot on the list. A system codeveloped by Riken and Fujitsu, Fugaku has an HPL benchmark score of 442 Pflop/s. This performance exceeds the No. 2 Summit by 3x. The machine is based on Fujitsu's custom ARM A64FX processor. What's more, in single or further reduced precision, which is often used in machine learning and AI, Fugaku's peak performance is actually above an exaflop. Such an achievement has caused some to introduce this machine as the first "Exascale" supercomputer. Fugaku already demonstrated this new level of performance on the new HPL-AI benchmark with 2 Eflop/s.

Outside of this, we saw quite a few instances of Microsoft Azure and Amazon EC2 Cloud instances fairly high on the list. Pioneer-EUS, the machine to snag the No. 24 spot and the No.27 Pioneer-WUS2, rely on Azure. The Amazon EC2 Instance Cluster at No. 41 utilizes Amazon EC2.

Here is a summary of the systems in the Top10:

  • Fugaku remains the No. 1 system. It has 7,630,848 cores which allowed it to achieve an HPL benchmark score of 442 Pflop/s. This puts it 3x ahead of the No. 2 system in the list.
  • Summit, an IBM-built system at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA, remains the fastest system in the U.S. and at the No. 2 spot worldwide with a performance of 148.8 Pflop/s on the HPL benchmark, which is used to rank the TOP500 list. Summit has 4,356 nodes, each housing two Power9 CPUs with 22 cores each and six NVIDIA Tesla V100 GPUs, each with 80 streaming multiprocessors (SM). The nodes are linked together with a Mellanox dual-rail EDR InfiniBand network.
  • Sierra, a system at the Lawrence Livermore National Laboratory, CA, USA is at No. 3. Its architecture is very similar to the #2 system Summit. It is built with 4,320 nodes with two Power9 CPUs and four NVIDIA Tesla V100 GPUs. Sierra achieved 94.6 Pflop/s.
  • Sunway TaihuLight, a system developed by China's National Research Center of Parallel Computer Engineering & Technology (NRCPC) and installed at the National Supercomputing Center in Wuxi, which is in China's Jiangsu province, is listed at the No. 4 position with 93 Pflop/s.
  • Perlmutter at No. 5 is new in the TOP10. It is based on the HPE Cray "Shasta" platform, and a heterogeneous system with AMD EPYC based nodes and 1536 NVIDIA A100 accelerated nodes. Perlmutter achieved 64.6 Pflop/s.
  • Selene, now at No. 6, is an NVIDIA DGX A100 SuperPOD installed inhouse at NVIDIA in the USA. The system is based on an AMD EPYC processor with NVIDIA A100 for acceleration and a Mellanox HDR InfiniBand as a network and achieved 63.4 Pflop/s.
  • Tianhe-2A (Milky Way-2A), a system developed by China's National University of Defense Technology (NUDT) and deployed at the National Supercomputer Center in Guangzhou, China, is now listed as the No. 7 system with 61.4 Pflop/s.
  • A system called "JUWELS Booster Module" is the No. 8. The BullSequana system build by Atos is installed at the Forschungszentrum Juelich (FZJ) in Germany. The system uses an AMD EPYC processor with NVIDIA A100 for acceleration and a Mellanox HDR InfiniBand as a network similar to the Selene System. This system is the most powerful system in Europe, with 44.1 Pflop/s.
  • HPC5 at No. 9 is a PowerEdge system build by Dell and installed by the Italian company Eni S.p.A. It achieves a performance of 35.5 Pflop/s due to using NVIDIA Tesla V100 as accelerators and a Mellanox HDR InfiniBand as a network.
  • Frontera, a Dell C6420 system is installed at the Texas Advanced Computing Center of the University of Texas and is now listed at No. 10. It achieved 23.5 Pflop/s using 448,448 of its Intel Xeon cores.
Rank System Cores Rmax (TFlop/s) Rpeak (TFlop/s) Power (kW)
1 Supercomputer Fugaku - Supercomputer Fugaku, A64FX 48C 2.2GHz, Tofu interconnect D, Fujitsu
RIKEN Center for Computational Science
Japan
7,630,848 442,010.0 537,212.0 29,899
2 Summit - IBM Power System AC922, IBM POWER9 22C 3.07GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband, IBM
DOE/SC/Oak Ridge National Laboratory
United States
2,414,592 148,600.0 200,794.9 10,096
3 Sierra - IBM Power System AC922, IBM POWER9 22C 3.1GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband, IBM / NVIDIA / Mellanox
DOE/NNSA/LLNL
United States
1,572,480 94,640.0 125,712.0 7,438
4 Sunway TaihuLight - Sunway MPP, Sunway SW26010 260C 1.45GHz, Sunway, NRCPC
National Supercomputing Center in Wuxi
China
10,649,600 93,014.6 125,435.9 15,371
5 Perlmutter - HPE Cray EX235n, AMD EPYC 7763 64C 2.45GHz, NVIDIA A100 SXM4 40 GB, Slingshot-10, HPE
DOE/SC/LBNL/NERSC
United States
706,304 64,590.0 89,794.5 2,528
6 Selene - NVIDIA DGX A100, AMD EPYC 7742 64C 2.25GHz, NVIDIA A100, Mellanox HDR Infiniband, Nvidia
NVIDIA Corporation
United States
555,520 63,460.0 79,215.0 2,646
7 Tianhe-2A - TH-IVB-FEP Cluster, Intel Xeon E5-2692v2 12C 2.2GHz, TH Express-2, Matrix-2000, NUDT
National Super Computer Center in Guangzhou
China
4,981,760 61,444.5 100,678.7 18,482
8 JUWELS Booster Module - Bull Sequana XH2000 , AMD EPYC 7402 24C 2.8GHz, NVIDIA A100, Mellanox HDR InfiniBand/ParTec ParaStation ClusterSuite, Atos
Forschungszentrum Juelich (FZJ)
Germany
449,280 44,120.0 70,980.0 1,764
9 HPC5 - PowerEdge C4140, Xeon Gold 6252 24C 2.1GHz, NVIDIA Tesla V100, Mellanox HDR Infiniband, Dell EMC
Eni S.p.A.
Italy
669,760 35,450.0 51,720.8 2,252
10 Frontera - Dell C6420, Xeon Platinum 8280 28C 2.7GHz, Mellanox InfiniBand HDR, Dell EMC
Texas Advanced Computing Center/Univ. of Texas
United States
448,448 23,516.4 38,745.9