Jack Dongarra on TOP500: Past, Present, and Future

As the TOP500 project works its way through the third decade of its life, there is talk of expanding the effort to include new metrics to Linpack, the benchmark that has defined the supercomputer list for the past 22 years. There is no one better to discuss this topic than veteran TOP500 co-author, Jack Dongarra, who was present at the birth of list in 1992.

Dongarra, a Distinguished Professor at the University of Tennessee, is the director of the University’s Innovative Computing Laboratory. He also doubles as a researcher at the Department of Energy’s Oak Ridge National Laboratory, home of Titan, the second most powerful supercomputer in the world. His research interests there focus on algorithm research and development, including, notably, Linpack.

We asked Dongarra to offer his perspective on the TOP500 and what might be in store for the project in the future.

How did you get involved in the TOP500 project?

Dongarra: In the early 1990s, a new definition of supercomputer was needed to produce meaningful statistics. In 1992, after experimenting with metrics based on processor count, Hans Meuer’s idea was born at the University of Mannheim to use a detailed listing of installed systems as the basis for supercomputer rankings. In early 1993, he persuaded me to join the project with the Linpack benchmark. A first test version was produced in May 1993, partially based on data available on the Internet, including the "List of the World's Most Powerful Computing Sites" maintained by Gunter Ahrendt and a 1992 paper titled "Kahaner Report on Supercomputer in Japan, maintained by David Kahaner, the director of the Asian Technology Information Program (ATIP).

The information from those sources was used for the first two TOP500 lists. Since June 1993, the TOP500 has been produced bi-annually based on site and vendor submissions only.

It has been more than two decades since the TOP500 project started. What transformations has it experienced and how has it transformed the nature of high performance computing?

Dongarra: A project such as the TOP500 which serves to collect data for a broad analysis of the HPC market and long term technological trends is served best by building upon an understandable performance measure embodied in a broadly applicable benchmark. It also had to be flexible enough to avoid handicapping otherwise well-designed systems, and still demanding enough to penalize architectures not suited to support large sections of the scientific computing market.  And while we acknowledge that there is reason and room for improvement, which we actively work on, we also believe that the TOP500 can continue to provide important data about architectural trends to anchor related discussions based on actual data instead of marketing talk. In doing so, we hopefully provide some much needed guidance towards the interesting times ahead. 

The TOP500 collection has enjoyed an incredible success as a metric for the high performance computing community.  The trends it exposes, the focused optimization efforts it inspires, and the publicity it brings to our community are very important.  As we are entering a market with growing diversity and differentiation of architectures, a careful selection of appropriate metrics and benchmarks matching the needs of our applications is more necessary than ever. 

High Performance Linpack (HPL) encapsulates some aspects of real applications such as system reliability and stability, floating point performance, and, to some extent, network performance. But it no longer test memory performance adequately. Alternative benchmarks, as a complement to HPL, could provide corrections to individual rankings and improve our understanding of systems, but are much less likely to change the magnitude of observed technological trends. 

The Linpack benchmark is often criticized for not being relevant for a growing number of HPC workloads, and yet it is still the de facto metric of supercomputer performance. What are the benefits and weaknesses of this benchmark?

Dongarra: The Linpack benchmark is, in some sense, an accident. It was originally designed to assist users of the Linpack package by providing information on execution times required to solve a system of linear equations. The first Linpack benchmark report appeared as an appendix in the Linpack Users' Guide in 1979. The appendix comprised data for one commonly used path in Linpack for a matrix problem of size 100, on a collection of widely used computers (23 in all), so users could estimate the time required to solve their matrix problem. Over the years other data was added, more as a hobby than anything else, and today the collection includes hundreds of different computer systems.

When HPL gained prominence as a performance metric in the early 1990s there was a strong correlation between its predictions of system rankings and the ranking that full-scale applications would realize. Computer system vendors pursued designs that would increase HPL performance, which would in turn improve overall application performance. Presently, HPL remains tremendously valuable as a measure of historical trends, and as a stress test, especially for leadership class systems that are pushing the boundaries of current technology. Furthermore, HPL provides the HPC community with a valuable outreach tool, understandable to the outside world.

At the same time, HPL rankings of computer systems are no longer so strongly correlated to real application performance, especially for the broad set of HPC applications governed by differential equations, which tend to have much stronger needs for high bandwidth and low latency. This is tied to the irregular access patterns to data that these codes tend to exhibit.
The Linpack benchmark is said to have succeeded because of the scalability of HPL, the fact that it generates a single number that makes the results easily comparable, and the extensive historical database it has associated. However, soon after its release, the benchmark was criticized for providing performance levels "generally unobtainable by all but a very few programmers who tediously optimize their code for that machine and that machine alone." That is because it only tests the resolution of dense linear systems, which are not representative of all the operations usually performed in scientific computing.  It‘s clear that HPL emphasizes "peak" CPU speed and number of CPUs; not enough stress is given to local bandwidth and the network.

Thom Dunning, then director of the National Center for Supercomputing Applications, had this to say about the benchmark: "The Linpack benchmark is one of those interesting phenomena -- almost anyone who knows about it will deride its utility. They understand its limitations but it has mindshare because it's the one number we've all bought into over the years."

In fact, we have reached a point where designing a system for good HPL performance can actually lead to design choices that are wrong for the real application mix, or add unnecessary components or complexity to the system. Worse yet, we expect the gap between HPL predictions and real application performance to increase in the future.  Potentially, the fast track to a computer system with the potential to run HPL at one exaflop (10 to the power of 18 floating-point operations per second) is a design that may be very unattractive for real applications.

Without some intervention, future architectures targeted toward good HPL performance will not be a good match for applications.  As a result, we seek a new metric that will have a stronger correlation to the application base and will therefore drive system designers in directions that will enhance performance for a broader set of HPC applications.

What is the purpose of the new benchmark, High Performance Conjugate Gradient (HPCG)? In the long-term do you think it replace or complement the Linpack benchmark?

Dongarra: The High Performance Conjugate Gradients (HPCG) Benchmark Project started two years ago as an effort to probe important characteristics of a computer system, highlighting and rewarding investment in system features, such as high performance interconnects and memory systems, and fine grain cooperative threading, that are important to a broad set of applications.

The associated HPCG benchmark measures computer systems based on a simple additive Schwarz, symmetric Gauss-Seidel preconditioned conjugate gradient solver.  HPCG is similar in its purpose to HPL but it is intended to better represent how today’s applications perform. Specifically, HPCG is designed to measure performance that is representative of many important scientific calculations with low computation-to-data-access ratios, which we call Type 1 data access patterns. To simulate these patterns that are commonly found in real applications, HPCG exhibits the same irregular accesses to memory and fine-grain recursive computations.

In contrast to the new HPCG metric, HPL is a program that factors and solves a large dense system of linear equations using Gaussian Elimination with partial pivoting. The dominant calculations in this algorithm are dense matrix-matrix multiplication and related kernels, which we call Type 2 patterns. With proper organization of the computation, data access is predominantly unit-stride and its cost is mostly hidden by concurrently performing computations on previously retrieved data.

This kind of algorithm strongly favors computers with very high floating-point computation rates and adequate streaming memory systems. The performance issues related to the Type 1 patterns may be fully eliminated when the code only exhibits the Type 2 patterns and this may lead the hardware designers not to include the Type 1 patterns in the design decisions for the next-generation systems.

As a result, HPCG has emerged as a complementary benchmark to the traditional HPL benchmark and has been used to rank about 40 of the top systems on the TOP500 list. The pair of numbers provided by HPCG and HPL act as bookends on the performance spectrum of a given system.

However, it is important to remember that no single benchmark can ever reflect the overall complexity of applications that run within a supercomputer architecture.

What does the future hold for the TOP500 project?

Dongarra: I anticipate that the TOP500 list will continue. It provides a historical look at computing and is good for spotting trends in system design and projecting how they might be more widely adopted in the broader HPC market. The HPC industry really needs a complete set of benchmark test results that show a wide variety of components being mixed and matched in different patterns and showing the effects on application performance.

The current approach for compiling the TOP500 can clearly not address truly novel architectures such as neuromorphic systems or quantum computers.  Should a market for such systems develop, very domain-specific approaches to benchmarking and ranking would need to be developed, which is very similar to the situation for data-intensive computing.

Current rating: 4.7