The prospects for OpenPower got a big boost this week, with the announcement that Google has deployed Power9 servers in its datacenters. That revelation was joined by the news that Tencent, PayPal, Uber, Alibaba, and LimeLight Networks are all adopting Power-based technology to support their various businesses. The announcements were made at the OpenPower Summit taking place in Las Vegas, Nevada.
The Google deployment has been in the works for some time. In 2016, the search giant teamed up with Rackspace and IBM to develop “Zaius,” a dual-socket Power9 server designed as an Open Compute Project platform. The server offers considerable performance thanks to the new Power9 CPUs, as well as plenty of memory and I/O capability. Each socket supports eight DDR4 memory channels, 60 PCIe Gen4 lanes, and 16 OpenCAPI lanes.
Zaius servers in Google datacenter. Source: Google
The OpenPower Summit presentation by Google systems hardware engineer Máire Mahony suggested that the Zaius deployment was still in its early stages. “We’re ready to scale up the number of applications and ready to scale up the machine count,” said Mahony.
The company has not revealed the number of Power9 servers installed, and probably never will. But Mahony did point to the workloads that Google appears to be targeting, namely, the ones that are in greatest demand by its customers: Gmail, YouTube, Google Map, Android, Chrome, Google Play, and Search. The latter is Google’s bread-and-butter application and has become much more computationally demanding of late since image and video search, as well as the use of speech-mediated search, requires a lot more computing heft than simple text searches.
Regardless of type, search workloads scale rather nicely with increased thread count, according to Mahony. “For web search, which consumes a significant amount of compute resources at Google, more cores and more threads is a good thing,” she said.
In this area, the advantage of the Power9 is striking. Unlike the two-way simultaneous multithreading Intel has implemented in its Xeon CPUs, the Power9 supports up to eight threads per core. That means you can get as many as 96 threads in flight at once on a 24-core Power9 processor. A 28-core Skylake Xeon 8180 would max out at 56 threads. Better yet, in Google’s performance testing, each Power9 thread delivered more performance than its Xeon counterpart.
Mahoney also alluded to another possible use case for the Power9 – that of a CPU host for their Tensor Processing Unit (TPU) coprocessor. Google uses its home-grown TPUs to accelerate its machine learning workloads, both training and inferencing. Although she didn’t explicitly say they were outfitting any of their Zaius servers with TPUs at this point, Mahony did lay out the case for doing so. She explained that for certain recurrent neural network work, application latency significantly degraded when the host CPU was given other tasks to perform. The culprit, they found, was DRAM bandwidth, which just so happens to be something the Power9 provides in abundance.
Google is also apparently considering outfitting its servers with storage-class memory like the Intel Optane SSD or the Samsung Z-SSD. Unfortunately, these devices rely on the relatively slow PCIe Gen3 interface. Mahoney says Google is more interested in such technologies if they could be hooked into the server via an interface with much better latency, like OpenCAPI. As of now, no such products exists, but Google’s interest in such hardware could spur some enterprising company to build such devices.
Google is not the only hyperscale company deploying OpenPower gear. Tencent and the Alibaba Cloud (Ali Cloud) have also installed some of these servers into their datacenters (more than likely based on Power8, rather than Power9 processors). Tencent says its adoption of the technology has saved 30 percent in server and rack resources, improving overall efficiency by the same amount. In the case of Alibaba, the company has deployed the technology in its Ali X-Dragon Cloud platform and has invited customers to give the hardware a whirl as part of a pilot program.
One step down from big hyperscale are companies like PayPal and LimeLight Networks, both of which have dipped their toes into the OpenPower ecosystem. Paypal is using these systems to speed up its fraud prevention work, while LimeLight is using the PCIe Gen4 capabilities of the Power9 hardware to accelerate its music and video streaming services.
The odd one out is Uber, which is teaming up with Oak Ridge National Lab to borrow some cycles on the upcoming Summit supercomputer. The 200-petaflop machine will be comprised of over 4,600 IBM servers, each of which will house two Power9 CPUs and six NVIDIA V100 GPUs. It's expected to come online this summer.
Uber wants to use Summit to run Horovod, the company’s distributed training framework based on TensorFlow. The deep learning work covers a number of Uber applications, including self-driving navigation, trip forecasting, and fraud prevention. Uber is particularly interested in pushing the envelope in scalability with Horovod, especially with regard to GPU usage.
According to Alex Sergeev, a software engineer on Uber’s machine learning team, they are anxious to see how many of Summit’s 27,000 GPUs can be effectively employed with their software. Given that the supercomputer will deliver three peak exaflops of deep learning performance, Sergeev thinks they may be able to demonstrate the first exascale application on the machine.