Industrial Application Areas of High-Performance Computing

In 1993, a list of the top 500 supercomputer sites worldwide was made available for the first time. Since then, the TOP500 list has been published twice a year. The list allows a detailed and well-founded analysis of the state of high-performance computing (HPC). This article summarizes the recent trends in application areas of HPC systems, focusing on the increase in industrial installations and applications.

Within the TOP500 project we are collecting information about the 500 most powerful computer systems, ranked by LINPACK performance. Since June 1993 we have been publishing the TOP500 lists twice a year [1]. Because these lists record a variety of different data, they furnish an excellent basis for studying the high-performance computing (HPC) market (see, for example, [2], [3] and [4]). Moreover, such lists can provide valuable insights about changes over time; see, for example, a study on the technologies used in HPC systems [5].

In this article, we analyze the type of customer and applications of the HPC systems in the TOP500 since 1995. During this time there has been a strong growth in the number of industrial users, and a comparable increase in the number of computer installations at industrial sites. One reason for this increase is that companies such as IBM and SGI have offered binary-compatible systems, from single workstations up to full-scale parallel systems. These companies thus have been able to sell a large number of systems to commercial customers; in turn, their systems often are selected for new supercomputer application areas. Another reason for the increase in industrial installations is that industrial customers have gained the needed experience to use medium-sized parallel systems (with up to 128 processors, and in some cases even more) and are now pressuring their companies to purchase high-performance supercomputers.

The variety of applications areas represented in the TOP500 has also been increasing during this time. The most important examples of new areas are database applications and image processing.

 

Performance Measure

The LINPACK [6] report focuses on solving a linear systems of equations. Algorithmic changes are permitted as long as the stability of the original Gaussian elimination is achieved. For the TOP500 we generally use the LINPACK results because this measure is available for almost all systems and still provides a first correction to the peak performance of a system. We make one exception, however: we do not consider results achieved with the Strassen algorithm. The reason is that the respective operation counts differ if Strassen matrix multiplication is used in the block formulation of the linear equation solver.

For practical reasons we are using the LINPACK performance for all systems listed in the TOP500 regardless of the application. LINPACK provides an adequate unit of measurement if one is interested in floating-point performance of computer systems. It is certainly not adequate for systems used for database applications, however. More useful benchmarks such as the TPC benchmarks are available for such applications. By using the LINPACK\ benchmark, we miss all ``pure'' database systems, such as those from Teradata or Tandem, since no adequate LINPACK performance values are available for them (most likely, even a Fortran compiler would not be available). Therefore, we cannot produce statistics for the different vendors in the database market. Nevertheless, since we can track a reasonable sample of this market, we can see the fundamental trends, and we can compare the importance of these new applications for parallel systems with the more traditional numerically intensive applications.

Type of Customer

The year 1995 was a remarkable one for the TOP500 in several respects. In addition to new technologies used for HPC systems [5], there were considerable changes in the distribution of the systems in the TOP500 for the different types of customer (academic sites, research labs, industrial/commercial users, vendor installations, and confidential sites) (see Fig. 1).

 

figure48

Figure 1: The distribution of systems on the different application areas over time

 

Until June 1995 the major trend seen in the TOP500 data was a steady decrease of industrial customers, matched by an increase in the number of government-funded research sites. This trend reflects the influence of the different governmental HPC programs that enabled research sites to buy parallel systems, especially systems with distributed memory. Industry was understandably reluctant to follow this step, since systems with distributed memory often have been far from mature or stable. Hence, industrial customers stayed with their older vector systems, which gradually dropped off the TOP500 list because of low performance.

Beginning in 1994, however, companies such as SGI, Digital, and Sun started to sell symmetrical multiprocessor (SMP) models of their major workstation families. From the very beginning these systems were popular with industrial customers because of the maturity of these architectures and their superior price/performance ratio. At the same time, IBM SP2 systems started to appear at a reasonable number of industrial sites. While the SP initially was sold for numerically intensive applications, the system began selling successfully to a larger market, including database applications, in the second half of 1995.

Subsequently, the number of industrial customers listed in the TOP500 increased from 85, or 17%, in June 1995 to about 148, or 29.6%, in November 1996. We believe that this is a strong new trend because of the following reasons.

  • The architectures installed at industrial sites changed from vector systems to a substantial number of MPP systems. This change reflects the fact that parallel systems are ready for commercial use and environments.
  • The most successful companies (IBM and SGI) are selling well to industrial customers Fig. 2. Their success is built on the fact that they are using standard workstation technologies for their MPP nodes. This approach provides a smooth migration path for applications from workstations up to parallel machines.
  • The maturity of these advanced systems and the availability of key applications for them make the systems appealing to commercial customers. Especially important are database applications, since these can use highly parallel systems with more than 128 processors.

figure62

Figure 2: The distribution of performance on the different application areas over time

 

Figure 2 shows that the increase in the number of systems installed at industrial sites is matched by a similar increase in the installed accumulated performance. The relative share of industrial sites rose from 8.7% in June 1995 to 14.8% in November 1996. Thus, even though industrial systems are typically smaller than systems at research laboratories, their average performance and size are growing at the same rate as at research installations. The strong increase in the number of processors in systems at industrial sites is another major reason for the rise of industrial sites in the TOP500. The industry is ready to use bigger parallel systems than in the past.

 

Geographical Distribution of Industrial HPC Systems

The United States clearly leads the world both as producer and as consumer of high-performance computers [5]. Analyzing the geographical distribution of the customers in the TOP500 we see that this leadership pattern is reflected in industrial siting of high-performance computers. As Table 1 indicates, in the United States, 38% of the systems are installed at industrial sites compared with 23% in Europe and only 11% in Japan. In the United States, there are more systems at industrial sites than at governmental research labs or at academic sites. While having installed 54% of all systems worldwide, the United States holds 70% of all industrial sites.

Systems Installed at  
Region Academic Research Industry Classified Vendors Total
U.S. 44 81 104 28 14 271
Japan 28 39 9 0 4 80
Europe 44 52 31 3 2 132
Others 11 1 4 1 0 17
Total 127 173 148 32 20 500
Table 1: Geographical distribution of installation types as of November 1996

Application Areas

For research sites or academic installations, it is often difficult--if not impossible--to specify a single dominant application. The situation is different for industrial installations, however, where systems are often dedicated to specialized tasks or even to single major application programs. In the TOP500 project we have tried since the very beginning to record the major application area for the systems in the list. We have managed to track about one-third of the systems over time. Most of these systems are installed at industrial sites.

In 1993, the applications typically were numerically intensive applications, for example,

  • automotive applications,
  • aerospace studies,
  • chemical and pharmaceutical studies,
  • electronics,
  • energy research,
  • geophysics and oil applications, and
  • weather prediction.
The share of these areas from 1993 to 1996 remained fairly constant over time, as can be seen in Fig. 3. The possible exception was the electronics industry: the number of recorded systems went down from 14 in June 1993 to 5 in November 1996.


figure62

Figure 3. The distribution of performance on the different application areas over time

Recently, however, industrial systems in the TOP500 have been used for new application areas. These include

  • database applications,
  • finance applications,
  • image processing, and
  • WWW servers.
The only clear trend seen in Fig. 3 is the strong rise of database applications since November 1995. These applications include on-line transaction processing as well as data mining. The HPC systems being sold and installed for such applications are large enough to enter the first hundred systems--a clear sign of the growing maturity of the systems and their practicality for industrial usage.

It is also important to notice that industrial customers are buying not only systems with traditional architectures, such as the SGI PowerChallenge or Cray Triton, but also MPP systems with distributed memory, such as the IBM SP2. Distributed memory is no longer a hindrance to success in the commercial marketplace.

Conclusions

The success of massively parallel systems in commercial environments is not bound to any special architecture. Maturity of systems and availability of key application software in a standard Unix system environment are much more important than details of the system architecture. The use of standard workstation technology for single nodes is one key factor. This eases the task of building reliable systems with portable application software.

From the present eight releases of the TOP500 we see the following trends:

  • The number of industrial customers in the TOP500 has risen steadily since June 1995.
  • The most successful companies (IBM and SGI) are selling disproportionately well in the industrial market.
  • The average system size at industrial sites is increasing strongly.
  • Database applications are the most important and most successful new application area for supercomputers.
  • Distributed-memory systems are being installed at industrial sites in reasonable numbers.
  • The United States is the clear world leader in the industrial use of HPC systems.

     

References

1
J. J. Dongarra, H. W. Meuer, and E. Strohmaier, TOP500 Supercomputer Sites. Technical Report 33, 34, 38, 40, 41, 42, University of Mannheim, Germany, June 1993, November 1993, June 1994, November 1994, June 1995, November 1995
2
J. J. Dongarra, H. W. Meuer, and E. Strohmaier, eds. TOP500 Report 1993, University of Mannheim, 1994
3
J. J. Dongarra, H. W. Meuer, and E. Strohmaier, eds. TOP500 Report 1994, SUPERCOMPUTER 60/61, vol. 11, no. 2/3, June 1995
4
J. J. Dongarra, H. W. Meuer, and E. Strohmaier, eds. TOP500 Report 1995, SUPERCOMPUTER 63, vol. 12, no. 1, January 1996
5
J. J. Dongarra, H. W. Meuer, H. D. Simon, and E. Strohmaier, Changing Technologies of HPC, Future Generation Computer Systems, to appear
6
J. J. Dongarra, Performance of Various Computers Using Standard Linear Equations Software, Computer Science Department, University of Tennessee, CS-89-85, 1994

Jack J. Dongarra,
Hans W. Meuer,
Horst D. Simon,
Erich Strohmaier