Energy Efficiency Advances Sluggish on Latest Green500 List

Aug. 8, 2016

By: Michael Feldman

The 19th edition of the Green500 was released today with the usual array of accelerated systems at the top of the list and a plethora of energy-sipping x86 clusters comprising the remainder. For the most part though, energy efficiency gains slowed over the past 12 months after chalking up some pretty impressive gains in previous years.

The upper part of the current list, in particular, is virtually unchanged with regard to the energy efficiency numbers. Twelve months ago, the average for the top 10 Green500 machines was 4993.5 MFLOPS/watt. On the latest list, it’s 4934.3 MFLOPS/watt – a slight drop from a year ago. The drop could be attributed to the new measurement protocols being used this time around, which was unified in the TOP500-Green500 merge. Or it could just be a slightly different mix of systems was submitted for the first 2016 list

But the fact remains that over the last two bi-annual lists, there has been little change in energy efficiency on the greenest systems. And this is not the result of the actual machines remaining the same. Eight of the top ten systems are different than those that appeared on last June’s Green500 list. RIKEN’s PEZY-SC-accelerated Shoubu supercomputer is still perched atop the list with a world beating 6673.8 MFLOPS/watt, but save for the number four GSI Helmholtz Center supercomputer, all the other systems have been swapped out with different machines.



That would suggest the underlying technology has stagnated over this time frame. Of course, this could be a temporary blip that would naturally occur between the product cycles for Intel’s Xeon Phi and NVIDIA’s Tesla GPU, which are the two mainstream processor platforms driving the biggest performance-per-watt gains in the supercomputing space. For example, in this latest list, the accelerators powering most of the top systems are based on 2014-era technology, namely NVIDIA’s K40 and K80 GPUs, and AMD’s S9150. The newer and greener Knights Landing Xeon Phi and NVIDIA P100 processors are just now making their way into HPC gear.

Focusing on the top of the Green500 list is not just snobbery. These are the systems and the technologies that will lead to the first exascale systems in the 2020s, as well as more mainstream systems a few years later. The worrying thing about these periods of stagnating energy efficiencies is that it slows the trajectory for reaching practical exascale machines -- those that can operator on 20 MW or less – to the middle of the next decade. The 20 MW goal for one Linpack exaflop is equivalent to 50 GFLOPS/Watt. The greenest system today, Shoubu, is at about 6.67 GFLOPS/Watt, so we need about another 7x increase to hit that goal.

The real worry is that the biggest energy efficient gains over the last decade or so were due to the introduction of manycore/accelerator types of processors, specifically GPUs and their ilk, which juiced performance-per-watt significantly during the petascale era. Nine years ago when the greenest computers were based on Blue Gene CPUs, energy efficiencies were less than a tenth of what they are today: around 350 GFLOPS/Watt. But the big jump realized by exploiting manycore throughput processors is not likely to be reproduced again in this decade unless someone can develop another order-of-magnitude type of innovation to bring to market.

By the way, most systems on the Green500 list are still powered by on x86 CPUs, without the benefit of accelerators. So one would surmise that energy efficiencies across the entire list would map pretty closely to generic clusters. Even here though, the performance-per-watt gains for the average Green500 system have been modest. A year ago average energy efficiency was 915.8 MFLOPS/Watt. On the current list that average has bumped to 1116.8 MFLOPS/Watt – about a 20 percent increase. That not bad, but it doesn’t exactly instill confidence that Moore’s Law is working very well for mainstream processors. The transistors might be getting smaller and allow for more cores per chip, but that’s not translating into analogous gains in greener computing.

There are plenty of reasons to be optimistic though. The entire computer industry, from the mobile space to HPC is focused on reducing power usage. And there are plenty of innovations on the horizon, everything from 3D transistors to silicon photonics and memristors  to integrated NICs and in-socket memory. Developments like these are certainly far enough along to make it into exascale systems, and some well before that.

An alternate approach is to simply throw out expendable features and componentry. One rather prominent example of this is the current reigning TOP500 champ, China’s Sunway TaihuLight supercomputer . That system is based on the 64-core SW26010, which uses a simplified cache setup. It also skimps quite a bit on memory capacity, using only 1.3 petabytes of DRAM to feed its gargantuan 93-petaflop appetite. That works out to only has 0.014 bytes per FLOP/second, a far reach from the nominal (and outdated) goal of one byte per FLOP/second. But reducing the cache hierarchy and memory capacity saved a lot of power and earned TaihuLight the number three spot on the Green500 list with a mark 6051.3 MFLOPS/Watt.

One could dismiss this a stunt machine with too little cache and memory to be practical, but HPC developers are getting used to the idea of programming their applications with a lot less memory, just as they’re getting used to the challenges of writing code for manycore processors. In any case, the future of green computing is going to rely on both hardware and software innovations. And as was evident over the past 12 months, those don’t always occur on regular schedules.