AMD announced this week it will being shipping its new “Naples” server CPU in the second quarter of 2017, hoping to disrupt Intel’s hegemony in the server market. The upcoming chip looks to be the first CPU from AMD to offer a credible challenge to Xeon in the datacenter in nearly a decade.
Naples represents a re-imagining of the x86 Opteron, AMD’s flagship server CPU line. Based on the Zen architecture, the company is promising that Naples will represent “a step function increase, across the board.” Comparing it to its current rival in this space -- that of the Intel “Broadwell” (Xeon E5-2600 v4) Xeon processor -- AMD says Naples will provide 45 percent more cores (32 versus 22), 60 percent more I/O capacity (64 PCIe lanes versus 40), and 122 percent more memory bandwidth (171 GB/sec versus 77).
The high memory bandwidth is especially noteworthy, especially for performance demanding applications that are up against the memory wall (i.e., a large proportion of HPC codes). In fact, the 171 GB/sec figure is fairly close to the 230 GB/sec offered by an IBM Power8 processor. AMD accomplished this feat primarily by doubling up on the memory channels, going from the traditional four channels per socket to eight.
Memory capacity is impressive as well, with up to four terabytes possible on dual-socket server. That’s based on fully populating the 32 DIMM slots across both processors. A dual-socket server based on Intel Xeon E5-2600 v4 processors offers 24 memory slots, which tops out at 3 TB per server.
Since Zen supports two-way simultaneous multithreading (SMT), the 32 Naples cores will support 64 threads. In a typical dual-socket server setup, that means up to 128 threads per node. At that point, you essentially have a small cluster in just a single server, assuming you can outfit that server with enough memory and I/O to keep all those threads well-fed. Given all the extra support AMD has provided for those interfaces, that certainly seems doable.
What also what might work to AMD’s advantage is some useful synergy with the company’s GPU datacenter product lines. Thanks to all those PCIe lanes, there is plenty of support to attach FirePro or Radeon cards, especially the new “Vega” Radeon Instinct GPUs aimed at deep learning. In fact, up to four cards can be hooked to each processor. To be sure, these could be NVIDIA cards as well. The 64 PCIe lanes per processor will also come in handy when you need to hook up Mellanox’s new HDR InfiniBand adapter, which needs a 32-wide interface (at PCIe 3.0 speeds) for its 200 Gbps operation.
By the way, at some point AMD will also be able to use its Zen-based inter-processor interconnect, known as the Infinity Fabric, to attach its GPUs. Initially, the fabric will be used for the normal CPU cross-talk on a dual-socket, cache coherent system. But since the company’s new Vega GPUs will support the interface, it should also allow these new graphics processors to be installed on a motherboard without consuming PCIe real estate. That means, in theory, one could build something akin to a IBM Power/NVIDIA GPU server, where CPUs and GPUs chat with each over the high-speed NVLink. In this case though, AMD will be providing the host, the accelerator, and the interconnect substrate.
Since a product is not being launched at this point, there’s no word on things like clock speeds and power draw, not to mention pricing. Those aspects count for a lot in most server environments, especially in cloud datacenters or supercomputers, where CPUs are purchased by the truckload and the smallest of advantages can quickly add up to significant cost savings.
Also, since we can only have superficial characteristics like core count and memory bandwidth, to assess, it’s difficult to tell how Naples will actually perform in the wild. AMD did offer up a performance comparison for a real-world application – a seismic analysis code that encapsulated 3D wave equations. The company said it used this example since it taxes the cores, memory and I/O. Plus, it is representative of “many technical workloads.”
The comparison pitted a dual-socket Naples server against an Intel Xeon one equipped with E5-2699A v4 processors. In the first test case, 44 Naples cores and 44 Intel Xeon E5-2699A v4 cores ran the seismic code. The Naples-based system completed the run in half the time of the Xeon-based system: 18 seconds versus 35 seconds. When all 64 Naples cores in the server were used (and the memory speed was kicked up to 2400 MHz), the run-time was reduced to 14 seconds. The clock speed of the Naples processor used was not provided, and neither was other configuration information on either system that might have affected the results. So your mileage may vary. Nevertheless, AMD seems to be confident that Naples can outrun the Xeon competition on these compute-intensive and memory-intensive codes. And the superior memory bandwidth look to be the crucial feature here.
If those kinds of numbers hold up across a range of HPC codes, not to mention more mainstream enterprise and cloud applications, AMD could find its fortunes on the rise in the very near future. Today, Intel owns more than 95 percent of the mainstream processor server market, which according to IDC is closing in on $14 billion per year. AMD currently has less than one percent of that market. (Ten years ago, the company’s market share was couple of points north of 20 percent.) If AMD could capture just 10 percent of those sales, the company could add well over a billion dollars to its annual revenue.
The new Naples products will almost certainly be competitively priced against its Xeon competition. And if the real-world performance numbers reflect the advantages being touted by AMD, Intel will have its hands full.
At this point, it’s worth mentioning the “Skylake” Xeon launch is imminent, and will likely happen in concert with the Naples product ramp. The top Skylake parts may match Naples with 32 cores, or at least get rather close to it. In addition, Intel is also expected to bump up the memory channels to six per processor, getting pretty close to the eight in Naples. Further, the new Xeon will offer AVX-512, which will double up its floating point performance compared to the current Broadwell parts. All of which should make for the most competitive CPU rivalry the server market has seen in a long, long time. We'll keep you apprised.