News

Aquila Takes Wraps Off Liquid-Cooled HPC Server

None
Sept. 15, 2016

By: Michael Feldman

Aquila, a system provider based in Albuquerque, New Mexico, has unveiled a new liquid-cooled server platform that offers one of the densest and most energy-efficient architectures in the market. The platform, known as Aquarius, uses a patented warm water cooling technology along with rack-level power distribution to minimize energy consumption and allow for very high levels of computational density.

The warm water cooling design was developed by Clustered Systems, whose IP is now licensed by Aquila for their Aquarius offering. Whereas most commercial direct liquid-cooled solutions just draw the heat from the processor and the memory DIMMs, this cold plate is designed to siphon off heat from any motherboard component dissipating more than a couple of watts. And since all the critical semiconductor components are protected from large temperature fluctuations, the company claims Aquarius will support a “near zero failure rate.” In addition, since fans are no longer needed, another common source of server downtime has been eliminated.

Apparently, Aquarius does this without all the plastic hoses, quick disconnects, and pumps that are common to most liquid-cooled designs. Instead, the water is plumbed, as you would do in a building or house, greatly lessening the source of leaks and other failures.

Rack-level power supplies are used to convert external AC power to DC power, eliminating the need for individual power supplies for each server and making power distribution highly efficient. According to Bob Bolz, who heads up the HPC and Data Center Business group at Aquila, Aquarius can maintain a Power Usage Effectiveness (PUE) below 1.05, and probably closer to 1.03. In fact, since they have eliminated fans and individual power supplies, the server power reaches 95 percent efficiency. "We are looking at being able to build a liquid-cooled data center that is about as efficient as possible with current technology,” Bolz told TOP500 News.

The Aquarius team estimates that the design will allow customers to realize a 50 percent energy savings compared to a similarly equipped air-cooled system with conventional power supplies. As a result, they estimate that the additional cost of the Aquarius rack can be recovered within its first year of operation.

Bolz says the design is even 35 percent more efficient than other warm water cooled solutions, such as those of Asetek and CoolIT. In fact, he believes the efficiency of their solution is on par with that of immersion cooling technology like what Green Revolution Cooling offers, but without the attendant problems of dealing with big messy vats of diaelectric fluids, not to mention the annoying warranty voiding issues.

An HPC system with a version of this cooling technology was delivered to the DOE’s SLAC National Accelerator Laboratory by Clustered Systems in early 2013. The SLAC cluster, ran for 18 months straight, 24/7, with zero failures. More recently, Aquila demonstrated the technology at Sandia National Laboratories. Despite some skepticism from the lab personnel, Bolz says they were able to run Linpack on Intel Xeons at full tilt with Turbo Boost enabled, while keeping the core temperatures at least 15 degrees below the upper limit. Being able to do that on a consistent basis means a $2,000 CPU can be made to perform like a $2,800 one.

Such efficiencies in power and cooling enables Aquarius’s highly dense configuration. The Open Compute Platform (OCP) rack design allows for three dual-socket Intel Xeon servers in a 1U space and up to 108 servers per rack. Despite that density, there is still room for a couple of 2.5-inch hard drives or SSDs in each node.

The hardware itself is pretty standard. The Aquarius server supports two Intel Xeon E5-2600 (v3 or v4) processors and up to 256 GB of DDR4 memory per server. That adds up to 3,888 cores and 27.7 TB per rack in its maximum configuration. A half-height PCIe Gen 3 slot is available for devices like networks adapters, so both Intel Omni-Path and Mellanox InfiniBand are supported.

Accelerators are a problem though, since the PCIe setup is too small for a full-width device. That’s a shame, considering they can be the hottest components in an HPC server and would really benefit from liquid cooling. Bolz says they are looking to support Intel’s Knights Landing Xeon Phi in a future product, since these processors can be socketed onto a standard Intel motherboard and are thus amenable to their cold plate technolgy. There’s also hope for NVIDIA’s P100 with NVLink since it can use a mezzanine card instead of PCIe, but there are no standard board configurations as there are for Xeon and Xeon Phi processors.

Aquila’s first customer target looks to be the DOE, and especially those labs interested in advanced power and cooling technology for HPC gear, like Sandia and the National Renewable Energy Laboratory (NREL). NREL, in particular, would seem to be a natural customer, given its objective of using energy efficient computing infrastructure.

Successful proof points at DOE labs would go a long way to establish the platform for wider use in HPC datacenters. The bigger end game, though, is cloud providers and other hyperscale types of customers where power and cooling concerns are at the forefront of infrastructure purchasing decisions. In these environments, a 50 percent energy reduction can translate in tens of millions of dollars per year. Aquila may be the new kid on the block, but if it can demonstrate savings of that magnitude, it will surely get the attention of the Googles and eBays of the world.

Aquarius systems will begin shipping in the third quarter of 2016. Anyone interested in their offering will get a chance for a closer look in November at the Supercomputing Conference (SC16) in Salt Lake City, Utah, where Aquila will be exhibiting the new platform.