With Sierra, NNSA Embraces Heterogeneous Supercomputing

Oct. 28, 2018

By: Michael Feldman

The launch of the Sierra supercomputer at Lawrence Livermore National Laboratory (LLNL) marks a new era for the National Nuclear Security Administration (NNSA) and its Advanced Simulation and Computing (ASC) Program.

Sierra first made its presence felt on last year’s TOP500 list, where it captured the number three spot in the November rankings. Since then, LLNL engineers and developers have been preparing the system for its official role in support the ASC’s core mission, namely running digital simulations to ensure the readiness of the America’s aging nuclear stockpile.  

These simulations initially became necessary when the US ended underground nuclear weapons testing in 1992. Above-ground testing had ceased three decades earlier. The problem is that over time, various components in nuclear weapons degrade, requiring that parts be replaced or refurbished in order to extend their service life.

Accurately modeling the stockpile is computationally intensive, requiring simulations involving high-energy plasma physics, hydrodynamics, and material physics. The models are constantly being improved to provide more accurate information about the effectiveness and safety of the weapons, which requires more and more computing horsepower.

Sierra certainly has that. Besides being the third fastest supercomputer on the planet, its peak performance of 125.6 petaflops makes it the most powerful machine ever deployed by the NNSA. Each of Sierra’s 4,320 nodes are equipped with two IBM Power9 CPUS and four NVIDIA Tesla V100 GPUs as coprocessors. Over 96 percent of the system’s flops are provided by the GPUs, each of which delivers 7 peak teraflops of double precision floating point performance and 125 teraflops of deep learning performance. That’s expected to make the new system 6 to 10 times more capable than Sequoia, LLNL’s previous top machine and still the eighth fastest supercomputer in the world.

It also makes Sierra the NNSA’s first really big heterogeneous supercomputer and the first one that will rely so heavily on GPU acceleration for its mission. The previous crop of NNSA’s capability systems were either IBM BlueGene/Q machines (Sequoia and Vulcan) or Cray XC40 systems powered by Intel Xeon Phi processors (Trinity). IBM abandoned the BlueGene line years ago and last year Intel jettisoned the Xeon Phi product. Mark Anderson, director for the Office of Advanced Simulation and Computing and Institutional Research & Development at NNSA, characterized Sierra as “a harbinger of future computing technology and a critical step along the path to exascale.”

NNSA is planning to make that step to exascale with a system called “El Capitan,” which LLNL is expected to deploy in 2023.  At least from a flops perspective, it will be about ten times as powerful as Sierra. And if Sierra really is a harbinger of the future, as Anderson said, that suggests that the NNSA is planning to stick with heterogeneous computing for its first exascale supercomputer. And not just heterogeneous computing, but one that provides acceleration for deep learning/machine learning, which is being exploited across the HPC application landscape.

That doesn’t necessarily mean the NNSA will be buying IBM Power/NVIDIA GPU machinery from here on out. Intel may come up with its own deep learning-infused heterogeneous design for the upcoming Aurora exascale supercomputer slated for installation at Argonne National Lab in 2021. Likewise. AMD, with its Zen CPUs and Radeon GPUs, could also offer a viable alternative. As always, time will tell.

Image: Sierra supercomputer.  Credit: Randy Wong/Lawrence Livermore National Laboratory