Intel Steers Back to HPC with Cascade Lake AP Processor

Nov. 5, 2018

By: Michael Feldman

Intel has officially unveiled Cascade Lake Advance Performance (AP), a 48-core Xeon variant aimed at the high performance computing crowd.



The new AP product line will target the same application set as the now-defunct Xeon Phi products, specifically those where performance is the paramount criteria. Of course, Intel didn’t explicitly reference AP as the sequel to Xeon Phi, whose abandonment caused a good deal of consternation among HPC customers that were counting on the upcoming Knights Hill processors. Instead, Intel characterized the new AP processor line as “designed for the most demanding high-performance computing (HPC), artificial intelligence (AI) and infrastructure-as-a-service (IaaS) workloads.”

The first of the new Advanced Performance product line, Cascade Lake AP will be released in the first half of 2019, in conjunction with the more mainstream Cascade Lake SP (Scalable Performance) processor. The simultaneous release of these two products points to Intel’s approach to its server portfolio going forward. By essentially merging the Xeon Phi with Xeon products, Intel will be able share chip development much more effectively across the two product lines, while offering customers a clear choice at each processor upgrade. In a nutshell, Intel recognized a manycore implementation does not require a wholly distinct Xeon architecture.

Cascade Lake AP will be outfitted with 48 cores, 20 more than the top-of-the-line Cascade Lake SP processor. It will also offer a lot more memory bandwidth, thanks the Cascade Lake AP’s 12 DDR4 memory channels – twice as many as Cascade Lake SP will have. The 48-core setup will be implemented as a multichip module (MCM) presumably consisting of two 24-core Cascade Lake dies glued together via a high-speed interconnect. Intel is generally not big on MCM designs but manufacturing a defect-free 48-core die at high yields is not really in the cards using current semiconductor technology. (By the way, AMD’s 32-core EPYC CPU also employs a multichip approach, using four eight-core dies.)

Beyond that Intel didn’t offer much else in the way of technical details, including any specifics with regard to the HPC chip’s numerical processing capabilities. Presumably it will offer the same AVX-512 vector units that we expect to see in Cascade Lake SP, although the AP variant will undoubtably have more of these units in proportion to the additional cores. It’s also likely to support the new Vector Neural Network Instruction (VNNI) instructions that take advantage of INT8 and INT16 formats, which are set to make their first appearance in Cascade Lake.

Those capabilities are reflected in Intel’s claims on Cascade Lake AP’s performance:

  • 1.21x higher Linpack performance compared to the 28-core Xeon 8180 (“Skylake”) processor
  • 1.83x higher Stream Triad performance compared to the same 8180 processor
  • 17x more images-per-second versus the Xeon Platinum processor at launch

The latter two claims are not a big surprise. Doubling the number of memory channels would certainly be expected to nearly double the performance of Stream Triad, an industry-standard benchmark that measures memory bandwidth. And the addition of VNNI technology could easily account for the order of magnitude increase in processing images using neural networks.

However, the Linpack performance increase is a little underwhelming, given the 70 percent bump in core count (and forgetting for a moment that the comparison is against a previous-generation Skylake SP chip and not a Cascade Lake SP processor). This suggest the clock speed on the AP is quite a bit slower than its SP brethren – certainly understandable when you consider that this is the traditional tradeoff when you go from multicore to manycore. And since Cascade Lake will still use 14nm process technology, one can’t count on a huge performance increase from smaller transistors.

The real question will be how the Cascade Lake AP performance numbers match up against those of the standard Cascade Lake SP line.  If the AP silicon is only marginally faster, then presumably Intel can only charge marginally more money for those processors. But even a 20 percent performance bump could be quite useful, since that can translate into buying and deploying 20 percent fewer servers. That assumes, of course, that the customer’s target applications are not dependent on the better single-threaded performance inherent in the faster clock frequencies of the SP processor.

Assuming Intel is able to make the Cascade Lake AP’s price-performance differentiation attractive, the new chip also has to outrun the competition, especially AMD’s Zen 2 “Rome” CPUs, which are scheduled to be released 2019. At this point, Intel is claiming Cascade Lake AP will deliver 3.4x better Linpack performance and 1.3x better Stream Triad performance compared to the first-generation EPYC processors. But we’ll have to wait until next year to see how this matchup shakes out against AMD’s the second-generation Rome chips, which are slated be built on TSMC’s 7nm process.

According to the leaked Intel roadmap that we reported on in July, beyond Cascade Lake AP is an unnamed second-generation AP processor that allegedly will be released in the latter half of 2020. If true, that chip will almost certainly end up in the Aurora exascale supercomputer that is headed to Argonne National Lab in 2021. Intel hasn’t publicly confirmed the existence of such a processor, but we’re guessing it will do so next year.