By: Michael Feldman
Cavium has launched its latest ARM server processor, the ThunderX2, a second-generation SoC aimed at the same datacenter workloads that are currently dominated by Intel’s Xeon CPUs. The new chip is designed to go head-to-head with those Xeons, while at the same time get out in front of the 64-bit ARM competition from Applied Micro, Broadcom, and others.
Significantly, some of those targeted workloads include those in high performance computing. Cavium points to computational fluid dynamics (CFD) and reservoir modeling as two application areas where it thinks the ThunderX2 chip can outrun its rivals. Although specific performance number have yet to be released, the company claims the ThunderX2 will “deliver comparable performance at a better TCO compared to the next generation of traditional server processors.”
Clock speeds are in the range of 2.4 to 2.8 GHz, rising to 3.0 GHz in Turbo mode. For a server-class chip with up to 54 cores, those are very respectable numbers. But since we don’t know how much power the various ThunderX2 SoCs are drawing at those speeds, it’s hard to make useful comparisons. For what it’s worth, Cavium’s first generation 48-core ThunderX, which is manufactured with 28nm technology, can get by on 80 to 100 watts. Considering the ThunderX2 is built with the 14nm FinFET process, the energy draw on the new chips should be at least as good. Although Cavium didn’t mention who is etching their new silicon, it’s a good bet GLOBALFOUNDRIES got the business, given that this is the company manufacturing the 28nm ThunderX SoCs.
Cavium’s leap to 14nm is significant. Just two months ago, Intel launched its mainstream Xeon E5-2600 v4 (Broadwell) on 14nm technology, which suggests semiconductor manufacturing may be entering a period of greater parity. Don’t get too excited though. Volume production for ThunderX2 isn’t expected until late next year, according to an EE Times report, which still leaves Intel comfortably out in front of its ARM rivals.
Setting that aside, the ThunderX2 feature set is certainly up to speed and looks to be geared toward power users, especially those with high-end memory requirements. The SoC is equipped with 6 DDR4 memory controllers per socket and supports speeds of up to 3200 MHz. If you want to double up on the DIMMs (to 12 per socket), the top speed drops to just 2966 MHz. Up to 3 TB of memory can be included in a dual-socket server configuration.
The new processor uses the second generation of the Cavium Coherent Processor Interconnect (CCPI) to maintain memory coherency in dual-socket NUMA servers. According to the company, this newer version has twice the bandwidth of what was available in the original design. Again, it’s hard to ascertain the significance of this since we’re not being treated to specific numbers. Other high-end features include support for 10/25/40/50/100G Ethernet, PCIe Gen3 x16, and SATAv3 interfaces. In aggregate, hundreds of gigabits of I/O bandwidth are available per socket.
At this point, one of Cavium’s biggest challenges is attracting a critical mass of OEMs and ODMs to build servers around their chip. No Tier 1 vendors have announced plans to design systems with the ThunderX2, but niche server-makers like Gigabyte Technology and E4 Computer Engineering voiced support for the new silicon in Cavium’s press release. Those two companies will be making an appearance at the ISC High Performance conference later in June, as will Cavium itself, so if you’re interested in the latest rumblings on these ThunderX2 chips, those three booths would be worth a visit.
It’s also worth noting that in 2014, Cray was exploring ARM-based hardware for supercomputing using Cavium’s original ThunderX parts, but that effort apparently hasn’t yielded anything for public consumption. In 2015, Penguin Computing began offering an Open Compute Project (OCP) compliant server with these same first-generation chips, so an upgrade path may be in the pipeline there.
The problem of attracting big server vendors to buy into new processor architectures is not confined to Cavium, or for that matter, to the ARM architecture. All chipmakers run into the chicken-and-egg problem of getting system builders interested in a new processor before customer demand makes it attractive to do so. This is exacerbated by the uphill slog to port the needed datacenter software like OS’s, networking libraries, compilers & runtimes, application libraries, and so on. ARM partners have actually made some headway here, but the corresponding x86 software is still vastly better and more mature, not to mention broader in scope.
Nevertheless, Cavium is hoping its latest offering will attract enough customers and vendors to gain a foothold in the datacenter. Taking a bite out of Intel’s business will be tough going for reasons already stated, but Cavium at least appears to be in back in the running regarding its ARM competition. Applied Micro’s latest offering, the 16nm X-Gene 3, is due out before the end of 2016 and looks to be generally on par with the ThunderX2. It offers similar speeds (3.0 GHz), more memory controllers (8), but fewer cores (32) compared to Cavium’s chip. AMD finally released its “Seattle” ARM Opteron at the beginning of 2016, but that processor has just eight cores, with two DDR4 channels, and appears to be designed for the lower end of the market. Broadcom’s 16nm “Vulcan” ARM server chip is promising an advanced processor design based on cores that can support up to four threads, but that effort appears to be well behind schedule.
Meanwhile, Qualcomm has generated some buzz with its 24-core 64-bit ARM prototype it announced last year. Linaro recently made that chip, or perhaps a more updated version of it, available to developers as part of a cloud service. That cloud also incorporates other ARM-based server gear from Huawei, AMD and Cavium. Qualcomm is aiming its ARM silicon at the hyperscale cloud space, looking to capitalize on the architecture’s reputation for better energy efficiency and price-performance.
Cavium is positioning the ThunderX2 more broadly, as is evidenced by the number of different integrated hardware accelerators it offers. The particular ones that end up on the silicon are a function of which application set is being targeted.
Which brings us to the four varieties of the ThunderX2, which Cavium characterizes as “workload optimized processors.” The ThunderX2_CP is the one aimed at HPC, but does double duty for cloud workloads, namely web serving, caching and search. It includes accelerators for virtualization and virtual switch offloading. The ThunderX2_ST is optimized for big data, cloud storage, massively parallel processing data bases and data warehousing. The accelerators here are for data protection, integrity, and security. The ThunderX2_SC is targeted to secure web front-end platforms, security appliances and Cloud RAN (radio access networks). Here the acceleration is provided by Cavium’s NITROX silicon, which provides some grease for various security protocols. Finally, the ThunderX2_NT chip is aimed at media servers, large-scale embedded applications, and network function virtualization (NFV) types of workloads. It includes Cavium’s OCTEON circuitry for accelerating networking functions like packet parsing, shaping, lookup, QoS and forwarding.
As mentioned before, none of these seem to be generally available yet. In the meantime, Cavium is offering a reference platform for developers using its “preferred ODM.” It is meant for customers who want to start building their software stack today in anticipation of wider deployment in 2017 and beyond.