By: Michael Feldman
The San Diego Supercomputing Center (SDSC) will double the number of GPUs on Comet, a two-petaflop supercomputer that came online in 2015. The new coprocessors are being added in response to a growing demand for GPU computing by researchers using the system.
Comet supercomputer. Source: SDSC
The upgrade is being accomplished courtesy of a $900,000 grant from the US National Science Foundation (NSF). The NSF has already invested $24 million in Comet, which covers both the initial outlay for the machine and its operation.
Comet is a large-scale Dell cluster powered by Intel Xeon E5-2600 processors of the Haswell generation, and hooked together with FDR InfiniBand. Only 36 of the system’s 1,984 nodes are currently equipped with GPUs, in this case, NVIDIA’s K80 Tesla modules. Each node has four K80s, for a total GPU count of 144. The upgrade will include 144 additional GPUs, once again spread out across 36 nodes, at four per node, but this time using NVIDIA’s lastest P100 Tesla parts.
Benchmarking performed by SDSC shows that the P100 GPU can deliver speed-ups of up to 2x on select applications, compared to the K80. That shouldn’t be too surprising, given that the P100 has nearly twice the peak performance of the K80, and offers half-precision floating point support for codes that dabble in 16-bit arithmetic. The P100 also comes with HBM2 stacked memory, which is nearly twice as fast as the K80’s GDDR5 memory.
Assuming SDSC is installing the PCIe-based version of the P100, Comet will get an additional 677 teraflops of peak performance. Adding that to the existing 419 teraflops of K80 performance will result in a machine with well over a petaflop of GPU capability. That will make Comet the largest GPU resource on the NSF’s XSEDE network of supercomputers.
“This expansion is reflective of a wider adoption of GPUs throughout the scientific community, which is being driven in large part by the availability of community-developed applications that have been ported to and optimized for GPUs,” said SDSC Director Michael Norman, who is also the principal investigator for the Comet program.
A variety of GPU-capable software is already running on the system, including AMBER, LAMMPS, and BEAST, molecular dynamics packages that have been optimized for graphics processors. The additional GPUs will help meet increased demand for these and other applications. A prime candidate in that other category is machine learning, which can be used by researchers in domains such as image processing, bioinformatics, and linguistics.
The new P100 GPU nodes are scheduled to go into production in early July.