Highlights - June 2008

All changes are from November 2007 to June 2008.

The new No. 1 system Roadrunner breaks the petaflop/s barrier and is one of the most energy efficient systems on the TOP500.
Four U.S. DOE systems dominate the TOP5.
Intel dominates the high-end processor market with 75 percent of all systems and 90 percent of quad-core based systems.
Quad-cores processors are used in 56 percent of the systems. Their use accelerates performance growth at all levels.
Top industrial customer at No. 10 is the French oil company: Total Exploration Production.
IBM defends its top market share ahead of Hewlett-Packard.

Highlights from the Top 10

The new Roadrunner system at DOE’s Los Alamos National Laboratory (LANL) built by IBM broke – as first system ever – the petaflop/s Linpack barrier with 1.026 petaflop/s. Roadrunner is based on the IBM QS22 blades which are built with advanced versions of the processor in the Sony PlayStation 3. These nodes are connected with a commodity InfiniBand network.
The TOP10 shows six new systems and three other systems that improved their measured speed.
The No. 1, 2, 3, and 5 systems are all installed at U.S. DOE laboratories and all TOP5 systems are in the U.S.
The No. 2 system is DOE’s IBM BlueGene/L system, installed at DOE’s Lawrence Livermore National Laboratory (LLNL) with a Linpack performance of 478.2 Tflop/s.
At No. 3 is a brand-new installation of a newer version of the same type of IBM system. It is a BlueGene/P system installed at DOE’s Argonne National Laboratory and it achieved 450.3 Tflop/s.
The No. 4 system is installed at the Texas Advanced Computing Center (TACC) at the University of Texas. It is built by Sun using SunBlade x6420servers and achieved 326 Tflop/s. This is the first time Sun placed a system in the TOP10.
The No. 5 system is a Cray XT4 system installed at DOE’s Oak Ridge National Laboratory. It was recently upgraded with quad-core processors and achieved a Linpack performance of 205 Tflop/s.
The No. 6 system is the first system on the list outside the U.S. and is installed in Germany at the Forschungszentrum Juelich (FZJ). It is an IBM BlueGene/P system and was measured at 180 Tflop/s.
The No. 7 system is installed at a new center, the New Mexico Computing Applications Center (NMCAC) in Rio Rancho, N.M. It is built by SGI and based on the Altix ICE 8200 model. It was measured at 133.2 Tflop/s.
For the second time since November, India placed a system in the TOP10. The Computational Research Laboratories, a wholly owned subsidiary of Tata Sons Ltd. in Pune, India, installed a Hewlett-Packard Cluster Platform 3000 BL460c system. They integrated this system with their own innovative routing technology and achieved a performance of 132.8 Tflop/s, which was sufficient for No. 8.
The No. 9 system is a new BlueGene/P system installed at the “Institut du Développement et des Ressources en Informatique Scientifique” (IDRIS) in France, which was measured with 112.5 Tflop/s.
The last new system in the TOP10 – at No. 10 – is also a SGI Altix ICE 8200 system. It is the biggest system installed at an industrial customer, Total Exploration Production. It was ranked based on a Linpack performance of 106.1 Tflop/s.

General highlights from the Top 500 since the last edition

Performance:

Quad-core processor based systems have taken over the TOP500 quite rapidly. Already 283 systems are using them. 203 systems are using dual-core processors, only eleven systems still use single core processors, and three systems use IBMs advanced Sony PlayStation 3 processor with 9 cores. The Linpack benchmark can utilize multi-core processors very well, which led to performance levels increasing above average across the whole list.
The entry level to the list moved up to the 9.0 Tflop/s mark on the Linpack benchmark, compared to 5.9 Tflop/s six months ago.
The last system on the list would have been listed at position 200 in the previous TOP500 just six months ago. This is the largest turnover rate in the 16 years of the history of the TOP500 project.
Total combined performance has grown to 11.7 Pflop/s, compared to 6.97 Pflop/s six months ago and 4.92 Pflop/s one year ago.
The entry point for the top 100 increased in six months from 12.97 Tflop/s to 18.8 Tflop/s.
The average concurrency level in the TOP500 is 4,850 cores per system up from 3,290 six month ago

Technology:

A total of 375 systems (75 percent) are now using Intel processors. This is up from six months ago (354 systems, 70.8 percent) and a represents the largest share for Intel chips in the TOP500 ever.
The IBM Power processors passed the AMD Opteron family and are now (again) the second most common processor family with 68 systems (13.6 percent), up from 61 systems (12.2 percent) six months ago. Fifty-six systems (11 percent) are using AMD Opteron processors, down from 78 systems (15.6 percent) six months ago.
Multi-core processors are the dominant chip architecture. The most impressive growth showed the number of systems using the Intel Harpertown and Clovertown quad core chips, which grew in six months from 102 to 253 systems.
The majority of remaining systems uses dual-core processors.
400 systems are labeled as clusters, making this the most common architecture in the TOP500 with a stable share of 80 percent.
Gigabit Ethernet is still the most-used internal system interconnect technology (285 systems), due to its widespread use at industrial customers, followed by InfiniBand technology with 120 systems.

Manufacturers:

IBM and Hewlett-Packard continue to sell the bulk of systems at all performance levels of the TOP500.
IBM held on to its lead in systems with 210 systems (42 percent) over HP with 183 systems (36.6 percent). IBM had 232 systems (46.4 percent) six months ago, compared to HP with 166 systems (33.2 percent).
IBM remains the clear leader in the TOP500 list in performance with 48 percent of installed total performance (up from 45 percent), compared to HP with 22.4 percent (down from 23.9 percent).
In the system category, Dell, SGI and Cray follow with 5.4 percent, 4.4 percent and 3.2 percent respectively.
In the performance category, the manufacturers with more than 5 percent are: Cray (6.6 percent of performance), SGI (5.9 percent), and Dell (5.5 percent of performance), each of which benefits from large systems in the TOP100.
IBM (118) and HP (163) sold together 281 out of 287 systems at commercial and industrial customers and have had this important market segment clearly cornered for some time now.

Geographical:

The U.S. is clearly the leading consumer of HPC systems with 257 of the 500 systems. The European share (184 systems – up from 149) is still rising and is again larger then the Asian share (48 – down from 58 systems).
Dominant countries in Asia are Japan with 22 systems (up from 20), China with 12 systems (up from 10), India with 6 systems (down from 9), and Taiwan with 3 (down from 11).
In Europe, UK remains the No. 1 with 53 systems (48 six months ago). Germany improved but is still in the No. 2 spot with 46 systems (31 six months ago).

Highlights from the Top 50

The entry level into the TOP50 is at 35.2 Tflop/s
The U.S. has about the same percentage of systems (52 percent) in the TOP50 than in the TOP500.
The dominant architectures are custom-built massively parallel systems MPPs with 56 percent ahead of commodity clusters with 40 percent.
IBM leads the TOP50 with 36 percent of systems and 56 percent of performance.
No 2 is Cray with 14 percent of systems and 10.4 percent of performance.
SGI is third with 10 percent of systems and 7.5 percent of performance, closely followed by Dell with 10 percent of systems and 4.4 percent of performance.
HP, absent from the TOP50 twelve months ago, has now 6 percent of systems and 5.1 percent of performance.
60 percent of systems are installed at research labs and 34 percent at universities.
There is no system using Gigabit Ethernet in the TOP50.
IBM’s BlueGene is the most-used system family with 10 systems (20 percent).
Intel processors are used in 38 percent of systems, ahead of IBM’s Power processors in 34 percent and AMD in 26 percent.
The average concurrency level is 24,400 cores per system – up from 15,690 six month ago.

Power consumption of supercomputers

For the first time, the TOP500 list is also providing power consumption values for many of the computing systems and it will continue tracking them in consistent manner. As “name-plate” power ratings can be several times higher than actual consumed power levels, we decided not to report name-plate or peak-power ratings at all and to report measured values only.

Measurements:

For consistency, we asked system manufacturers and owners to measure power consumption while running the Linpack benchmark. Either the complete system or part of the system could be measured. If only part of a system was measured, it had to include all essential hardware such as shared fans, power supplies in enclosures or racks. Components which depend heavily on the machine-room environment such as non-essential disks, water-cooling jackets around air-cooled racks, UPS systems, and similar parts should be excluded from measurements. Their power consumption is a reflection of the environment a computer system is used in and not of the computer system itself. Measurements reported took place on nodes, blade-enclosures, system racks, or full systems. These data were then scaled linearly to the full system.

Power Metrics:

Power efficiency is a popular metric used to compare different technologies. It can be used for this purpose as long as systems of similar size are compared. Power efficiency, however, is not useful for ranking individual systems. Due to their basic nature, efficiencies or densities carry no information about the “size” of an object and therefore cannot be used to rank system by size as done in the TOP500. To further increase the potential for misinterpretation, the ratio of Linpack performance over power consumption will always rank smaller system of a certain type higher than larger systems of the same type, giving the false and misleading impression that smaller systems are more useful for supercomputing than larger systems.

We therefore decided to list (at this point) only absolute power consumption of systems in the TOP500 itself. We are currently considering alternative approaches for ranking, which will include multiple system features such as performance, power consumption, and memory size.

Results:

General Power Levels:

First results about the general power consumption values reported include:

Average Power consumption of a TOP10 system is 1.32 Mwatt and average power efficiency is 248 Mflop/s/Watt.
Average Power consumption of a TOP50 system is 908 Kwatt and average power efficiency is 193 Mflop/s/Watt.
Average Power consumption of a TOP500 system is 257 kwatt and average power efficiency is 122 Mflop/s/Watt.

One possible explanation for the decreasing efficiency with rank is that only newer systems and technologies with better efficiencies can be found toward the top of the list. Toward the end of the list a mixture of newer and older technologies lowers the average efficiency level.

Power Efficiencies of Technologies:

Power efficiency values of different systems in the TOP500 are influenced by a variety of factors such as power consumption, Linpack efficiency, parallel scaling behavior, and size of system measured. With these restrictions in mind, we can analyze the collected data and find in general that:

Most energy efficient supercomputers are based on:
- IBM QS22 Cell processor blades up to 488 Mflop/s/Watt,
- IBM BlueGene/P systems up to 371 Mflop/s/Watt
Intel Harpertown quad-core blades are catching up fast:These systems are already ahead of BlueGene/L (up to 210 Mflop/s/Watt).
- IBM BladeCenter HS21with low-power processors (L5420) up to 265 Mflop/s/Watt
- SGI Altix ICE 8200EX Xeon nodes (E5472) with high efficient Linpack up to 240 Mflop/s/Watt
- Hewlett-Packard Cluster Platform 3000 BL2x220 with double density blades up to 227 Mflop/s/Watt