Getting Exascale Right, Not First

Sept. 20, 2016

By: Michael Feldman

The path to exascale computing hasn’t been an easy one. It has had to face a daunting set of challenges in energy efficiency, application parallelism, and system reliability, just to name a few.  The difficulties in bringing the hardware and software up to this level is considerable, but there is a more fundamental challenge at the heart of exascale: doing the necessary work of building an ecosystem that will last for a decade or more, not just for a handful stunt machines.

An article in Computerworld UK this week delves into this issue in some detail. The author spoke with ARM Research engineer Eric Van Hensbergen, who says that making exascale a national priority and building public consensus on the importance of this technology is key to realizing the goals of these efforts. In his estimation, the HPC community in the US and Europe have not made the case to the public, and by extension, policymakers, of the importance of pushing supercomputing forward for the betterment of society.

He is particular concerned about China, which has been pouring a lot of money into building up its HPC infrastructure.  His assertion that they are “number one” is debatable, and appear to be based on the country’s most recent achievements on the TOP500 list. But as he notes, a concerted effort in China has turned it into an HPC superpower in a relatively short period of times.

Van Hensbergen is most concerned that the citizenry in the West views supercomputing as a government handout that caters to academia, rather than as a societal good. That assumes that the general public is even aware of the technology – a questionable presumption.  But like many inside the HPC community, Van Hensbergen thinks the benefits of this technology to science and industry, and ultimately to society, are being largely ignored. “I wish there was more understanding of that within the public,” he says. “I feel like the US and Europe have been lagging a little because it’s been deprioritized.”

Van Hensbergen thinks the situation in Japan is much better in that regard.  He relates the story of how the K Computer project there was on the verge of cancellation in 2009 after the recession hit and NEC and Hitachi pulled out of the contract. To keep the project alive, 10 Nobel laureates went on primetime TV to convince their fellow citizens of the importance of the work. It apparently succeeded. Fujitsu went on to finish the project as the remaining contractor and delivered the 10-petaflop K Computer more or less on schedule.

The implication is that government projects like this in the US and Europe don’t enjoy the same sort of public support. But that actually ignores other data. As of 2012, confidence in the federal government in Japan was only 17 percent, which is lower than that of any country in North American or Europe, with the exception of Greece at 13 percent. Considering that Japan has been in the economic doldrums for more than two decades, such a statistic should not be too surprising.

The salvage of the K Computer project was most likely a one-off event that had more to do with national prestige than a particularly well-informed citizenry or confidence in government R&D programs. The truth is that the HPC community has not made a particularly good case for exascale anywhere.  That has partly to do with the fact that there are other more pressing (and, frankly more relatable) issues on the public’s agenda – global terrorism, economic dysfunction, war, and immigration/refugee problems. On the other hand, the application areas that exascale enthusiasts most often talk about – climate change, scientific research, advanced manufacturing, alternative energy production, precision medicine, and so on – are certainly on the minds of some voters, but they are not at the top of the list of their day-to-day worries.

Regardless, none of that has kept policymakers from initiating exascale programs. The US has its National Strategic Computing Initiative (NCSI) and the Exascale Computing Project (ECP), the EU has its European Exascale Projects, and Japan has its Flagship 2020 project. China has not defined a encapsulated its exascale work into a single project, at least not any that is publicly visible. Its efforts in this area are wrapped up in the country’s 13th Five-Year Plan for 2016-2020. Notably, since Japan has admitted its Flagship 2020 supercomputer project will be delayed by year or two, China now has the inside track on fielding the first exascale machine.

Not that being first guarantees technological superiority. Standing up a stunt machine as the first exascale supercomputer won’t confer any lasting advantage to the country or organizations involved. It is in this regard that the US program seems the most prudent. They appear to be committed to building exascale systems derived from commercial architectures, and with a software stack that meets the needs of a broad set of applications.  It’s significant that the US is not promising such a system before 2023, well behind other countries.

Although Van Hensbergen doesn’t explicitly endorse the more deliberate US approach, he does seem to recognize the importance of building up an exascale capability with staying power, rather than just engage in an HPC arms race for the sake of competition.  “These national infrastructures are not stunts to say ‘the US is better than China, or Europe, or Japan’ or anything like that,” he says, “-- it really is about the future of science, innovation and economies in these countries, as well as supporting a high-tech workforce, which is incredibly important.”

Of course, that’s a longer-term project than standing up an exascale system that meets certain benchmark numbers in performance and power usage. Given that, it would seem to make more sense if these efforts to advance HPC capabilities be continuous, rather than based on superficial milestones that can be achieved every 10 years or so. Rushing to develop new architectures and software stacks connected to this arbitrary timetable doesn’t really serve the industry or its users. Maybe slow and steady is the real secret to HPC superiority.