HPC in 2016: Hits and Misses

Dec. 20, 2016

By: Michael Feldman

As the year draws to close, TOP500 News looks back at some of the most prominent trends of the past 12 months in the world of high performance computing. From machine learning to new processors to exascale, there were plenty of topics to hold our attention in 2016. Here are this year’s top five hits and misses:

Hit: Machine learning is everywhere

If we had to pick out the most significant trend affecting HPC over the past year, it would certainly have to be machine learning, an application set that has proven to be as compute-demanding as any traditional supercomputing code. Under the broader category of artificial intelligence (AI), machine learning is now powering internet search, image recognition, language translation, recommendation engines, and self-driving vehicle technology, to name just a few applications. Machine learning is also being tacked onto traditional HPC workflows in science and engineering.

As a consequence, it has captured the attention of computer vendors and users alike, rewriting the rules on how chips, systems, and software will be developed from now on. As companies scramble to add solutions for this new application set, NVIDIA emerged as an industry leader.  Based on its early success in developing GPU-powered machine learning solutions for hyperscale companies and automobile manufacturers, the company’s stock tripled in 2016, earning it the honor of Yahoo’s Finance Company of the Year.

IBM and Intel are also working diligently at becoming frontrunners in the race for intelligent machines – IBM with its Watson platform (under the guise of cognitive computing) and Intel with its new strategy of providing AI-focused silicon. Meanwhile, hyperscalers like Google, Amazon, Microsoft, and Amazon are simultaneously building up the software ecosystem – both tools and applications -- and creating a nearly insatiable demand for machine learning infrastructure. It’s a virtuous circle that promises to keep spinning for the foreseeable future.

Miss: Intel’s competition comes up short

Even considering NVIDIA’s success in the machine learning space, Intel still dominates the high performance computing landscape like no other chipmaker. More than 90 percent of HPC systems are equipped with Intel processors, and that includes even NVIDIA GPU-accelerated machinery. The competition – AMD, OpenPOWER vendors, and ARM processor providers – has thus far failed to mount any serious challenge to Intel’s hegemony despite multi-year efforts to shave off market share.

That’s not to say these efforts are doomed to failure. AMD is pinning a lot of its hopes in getting back in the server game with its upcoming 32-core Zen “Naples” CPU, which was originally to be released this year, is now expected to start shipping in the second quarter of 2017. That design represents the first concerted effort by the company to compete at the high end of the server market since it introduced the Bulldozer architecture in 2011. With many datacenter customers longing for an alternative to Intel, Zen could be the technology that finally turns around AMD’s fortunes.

IBM’s 2013 OpenPOWER initiative to open up its Power architecture has paid little in dividends over the past three years. OpenPOWER did get a boost in 2014 when the US DOE awarded $325 million to deliver future IBM Power9-based supercomputers as part of CORAL (Collaboration of Oak Ridge, Argonne, and Lawrence Livermore), the energy agency’s program to field pre-exascale supercomputers. Since then though, IBM has announced only a handful of Power-based wins in the HPC realm, while other OpenPOWER server makers have steered clear of this market.

The story of the ARM competition is a little more complex, deserving its own entry...

Hit and Miss: ARM in HPC, a promise unfulfilled

ARM servers were originally supposed to be entering the datacenter (HPC, enterprise, and cloud) en masse by 2016, but that hasn’t worked as planned.  After half a decade of hardware development and ecosystem building, ARM-based servers have so far been relegated to research projects at a few HPC facilities, while in mainstream datacenters, they’re just a blip. ARM processor solutions from Cavium or AppliedMicro have yet to catch fire. The latter company is now under new ownership, which has stated it will divest AppliedMicro’s compute business.

The most compelling ARM server vendor is now Qualcomm, an IT powerhourse with a current market cap of $100 billion. The company is sampling its 48-core Centriq 2400 server SoC now, and is scheduled to start shipping it in second half of 2017. If anyone can make a run at Intel in the datacenter, it will be Qualcomm.

Meanwhile, an HPC-focused ARM design is now in the pipeline. Known as ARMv8-A Scalable Vector Extension (SVE), the new standard will be implemented by Fujitsu for its Post-K exascale system.  Other vendors may end up licensing the technology as well, although the short-term prospects of that are rather slim given the size of the market opportunity. Nonetheless, if ARM servers manage to gain a foothold in the datacenter, an HPC variant would make a lot more commercial sense.

Hit: Accelerators and manycore processors post big gains

This year saw the introduction of a new crop of multi-teraflop processors from Intel (Knights Landing Xeon Phi) and NVIDIA (P100 Tesla GPU), both of which are more than twice as fast as their immediate predecessors. One important twist on the story this year is that the acceleration models have diverged Although both are manycore designs built to speed up floating point-intensive applications, the Xeon Phi can operate as standalone CPU, while the NVIDIA offering still requires a CPU host to attach to. Each approach requires some performance/ease-of-use tradeoffs, but it remains to be seen which path will be most acceptable to customers.

One trend though is fairly clear: these manycore designs offer much better power-performance than the multicore CPUs they were built to augment. Many of the speediest supercomputers, as well as the most energy-efficient ones, rely on these accelerators to deliver the majority of flops. The main reason they are not more widely employed in HPC is the difficulty encountered in writing software to squeeze out those flops. If Intel has any advantage over NVIDIA, it’s that the Xeon Phi is not fundamentally different than a Xeon and can leverage many of the same development tools and software componentry.

FPGAs also had a big year, spurred by two watershed events: the acquisition of Altera by Intel and the widespread deployment of FPGAs in Microsoft’s Azure cloud. For its part, Intel has predicted that 30 percent of datacenter servers will be equipped with FPGAs by 2020, doing such tasks as machine learning, encryption/decryption, data compression, network acceleration, and scientific computing.  Microsoft is doing its part to make that dream come true, having put an exaflop (single precision) worth of FPGAs into its cloud to support the company’s AI services. If all of this comes to fruition at the turn of the decade, 2016 will be seen as the year when the technology was reborn.

Hit and Miss: Exascale – one step forward, one step back

The US government’s supercomputing program got a boost recently, with the DOE’s Exascale Computing Project rejiggering its timeline to complete the groundwork for the first system by 2021. That accelerates their original plan by at least a year and puts the US back in contention for the race to exascale.  Keep in mind though that China still expects to get its first machine up and running by 2020 and the CEA, France’s Alternative Energies and Atomic Energy Commission, is slated to get its first exascaler that same year.

Meanwhile, Japan’s exascale program, Flagship 2020, is going to be delayed by one or two years.  Fujitsu was originally supposed to deliver its Post-K exascale system to RIKEN in 2019 and go into production in 2020. The purported reason for the delay was semiconductor design issues, presumably related to implementation of the new ARMv8-A SVE processor design.

The irony of all this jockeying for position is that by the turn of the decade, it’s quite possible nobody will care much who reached the exascale milestone first or if it was reached at all. The broadening scope of data analytics, which is now being turbo-charged by machine learning, has pushed some of the enthusiasm around traditional flops-focused supercomputing into the background. That’s not a good or bad thing, it’s just the inevitable progression of technology and applications.