Intel has released its latest four-socket and eight-socket Broadwell-EX processors into the wild this week, which follows on the heels of the dual-socket Broadwell-EP chips the company launched at the end of March. The new chip family, known as the Xeon E7-8800/4400 series, are destined for scale-up servers running applications with prodigious appetites for memory and processor cores.
In today’s datacenters, those applications are doing things like real-time analytics, large-scale data mining, and in-memory computing. In most cases, that equates to enterprise work, but increasingly big datasets are driving HPC users to adopt the same sort of technology. And this is occurring across a number of high performance domains, including finance, earth sciences, genomics and other data-demanding science workloads.
To do this sort of work, most HPC users will just settle on the less capable, and considerably less expensive, dual-socket Xeon E5 processors, by loading up some of the E5 nodes with as much memory as they can accommodate. These “fat nodes” are often accelerated with SSD flash devices to further speed the data pipeline. Such fat nodes are typically embedded in a more conventionally configured HPC cluster, where they can be used preferentially for data-intensive workloads.
But the new Xeon E7 processors have some distinct advantages over the Xeon E5s for these types of applications. Foremost among these is memory capacity. The new Broadwell E7 will support up to 3TB of memory per socket (using the highest capacity 128GB DIMMs), which is twice the maximum capacity of the E5 technology and twice that of the previous generation E7. An eight-socket server using E7-8800 processors can reach 24TB, which will accommodate a lot of jumbo-sized datasets for in-memory computing. Even a four-socket server can manage 12TB when maxed out with RAM. E7 memory bandwidth is also significantly better than that of their E5 brethren, 102 GB/sec versus 76.8 GB/sec, respectively.
Theoretically all this can greatly speed up workloads like genomic analysis, which require a terabyte or more of working space per genome at run-time. For financial codes doing risk portfolio analysis, a larger and faster memory can provide more accurate and more timely intelligence for trading decisions. Oil and gas seismic analysis is another fertile area for big memory computing.
Performance-wise, the new Xeon E7 processors have slightly more capacity than the corresponding E5 chips. The top-of-the-line E7s have 24 cores; the E5s max out at 22 cores. Application behavior will tend to overshadow such small differences though. The larger memory size and aggregate memory bandwidth of the E7 are going to drive application performance in the majority of applications targeted by this product set.
But Intel is not looking to compete against itself here. The E7 is being positioned against IBM Power8 processors, where the greater memory reach – 3TB for the E7 versus 2TB for the Power8 – is a real differentiator for applications weighed down by large datasets. It should be noted, however, that the Power8 provides up to 230 GB/sec of memory bandwidth, which is more than twice that of the E7. So actual application performance is going to depend heavily on which element is in shortest supply during execution.
Intel also claims a 1.4x performance advantage for the E7 compared to the Power8, but that’s based on the artificial SPECint_rate_base2006 metric, so it may have little bearing in the real world. Intel’s bolder claim of a 10x performance per dollar advantage when comparing list prices of eight-socket servers based on the two processors is more compelling, but doesn’t take into account actual pricing from IBM and its distributors, nor the fact that Power8 servers are now available from non-IBM manufacturers like Tyan, under the OpenPower effort.
Speaking of which, it’s worth noting that aforementioned Tyan currently only sells a single-socket server configuration in its Power8 portfolio. For multi-socket systems, the company uses the Intel Xeon E7. Tyan recently announced its quad-socket FT76-B7922 platform, an HPC server that pairs up to four of the new Broadwell Xeon E7 processors with up to four Xeon Phi or GPU accelerators. Each server system can be configured with as much as 6TB of RAM and is targeted, as one might suspect, for high performance in-memory computing.
Dell also recently announced a refresh of its quad-socket PowerEdge R930 server using the new Xeon E7 processors. In this case though, the company is aiming them strictly at enterprise work: OLTP, ERP, CRM, business intelligence, and the like. As with Tyan’s quad-socket gear, the R930 maxes out at 6TB per server.
True to form, SGI is taking the new E7 processors one step further, employing them in the company’s UV server line to build much larger symmetric multiprocessing (SMP) machines than is possible with the built-in four-socket and eight-socket support. For the UV 300, up to 64 E7 processors can be hooked together, and share up to 64TB of memory in NUMA fashion, courtesy of SGI’s custom-built NUMAlink controller.
The UV 300 is designed for the largest data analytics jobs, where extreme levels of data capacity or processor horsepower is needed in a shared memory setup. The server is also equipped with an MPI Offload Engine for HPC cluster duty, where a high-capacity “super node” is desired. For less demanding work, SGI offers the quad-socket UV 30EX, a 5U server that maxes out at 6TB of memory. It comes with the MPI Offload engine as well, and can be upgraded to a UV 300 if the need arises.
Hewlett-Packard Enterprise (HPE), Fujitsu, Cisco, Huawei, Lenovo also announced updated servers based on the new Xeon E7 products.
As it turns out, Broadwell could be the end of the line for the E7 series as a distinct product family. There’s been a good deal of informed speculation (not to mention some rather specific slides of Intel’s server roadmap) that suggests the next-generation Skylake Xeons, which are due to show up in the latter half of 2017, will merge the E5 and the E7 lines under the Purley platform. If that comes to pass, it will result in a single Xeon family that supports two-, four-, and eight-socket servers across what is likely to be a rather large SKU set. Reducing the Xeons down to a single platform should simplify things considerably for Intel and its OEM and ODM partners.
At least some of these Skylake Xeons look to integrate Omni-Path controllers, while others will be offered with FPGA integration. Oh, and these same chips are also in line to incorporate the 512-bit AVX vector instruction, which at this point is only available in the Knights Landing Xeon Phi processors. All of which should make for some rather exciting new silicon for the high performance computing crowd toward the end of next year.