About Big Arms and Long Legs

|

Andreas Stiller

(Translation of the German original in c't by Marcel Sieslack)

A few days before AMD launched its bearer of hope for the server market, the Bulldozer Interlagos, a new strong competitor threw their hat into the ring: ARM.

At the beginning of the year, ARM had stated that it planned to extend the ARM architecture to 64 bits by 2014/15. Now, at ARM's TechCon 2011 in Santa Clara, right where competitor Intel has its headquarters, the British company announced that it's advancing much faster and that the first server processors with 64-bit ARMv8 architecture might become available in 2013. Microsoft has probably exercised quite a bit of pressure.

As for 2015, ARM already boldly aspires a server-market share of 10 per cent [--] which is more than AMD has got right now and whose share of this profitable market is stagnating at around 6 to 7 per cent. AMD hopes that the soon to be released Interlagos processor will bring new, upward, motion into the market. After all, it's the first true server processor with Intel's AVX extension and with the fused multiply-add instructions so treasured in high performance computing. But AMD shouldn't tarry, the new competition is already champing at the bit. And it's coming in force: ARM predicts a huge total of over 100 billion ARM processors by 2020.

Similar to AMD's 64-bit extension in the world of x86, the ARMv8 architecture features a 64-bit (Aarch64 or just A64) as well as a 32-bit operating mode. The latter is in turn divided into the normal ARM (A32) and the Thumb Mode (T32). In A64, 31 64-bit general-purpose registers and 32 128-bit media registers are available. The media unit is also responsible for cryptography and supports instructions for AES and SHA-1/2. As well as a hypervisor mode for virtualisation, ARMv8 features a TrustZone Monitor that can watch over a Secure World OS, which executes appropriately "trusted apps".

In typical RISC style, the instruction length is still fixed at 32 bits, which strongly simplifies the whole frontend, including the decoding. On the other hand, various instructions will probably often be necessary to handle 64-bit immediates or offsets. ARM hasn't publicly released any details concerning the new ISA, though; they are expected near the end of 2012. By then, ARM partner Applied Micro (APM) [--] the former name extension "Circuit Corporation" has meanwhile been eliminated [--] intends to be offering ARMv8-compatible hardware. Much to everyone's surprise, the CEO of Applied Micro, Dr. Paramesh Gopi, introduced the visitors of the TechCon to a prototype board running Linux with six Virtex-6 FPGAs from Xilinx that emulates the planned server on a chip (SoC) called X-Gene.

With 4 to 128 ARMv8-compatible cores, (4-way super-scalar, out of order) X-Gene is supposed to deliver a 3 GHz clock rate, connected via a terabit fabric with 100 Gb/s per socket and an integrated 10 Gb LAN interface. TSMC has been chosen to manufacture the SoC in the second half of 2012, at first in 40 and later on in 28 nm. What's special about it, is its particularly low power consumption. In idle mode it's supposed to consume no more than 500 mW per core and in sleep mode as little as 300 mW are sufficient for the whole SoC.

Power Saving Server

Under load [--] so a fuzzy image from Applied Micro on Bright Side of the News shows [--] the power consumption of the 8-core version is supposed to be about 25 watts [--] just like that of Intel's ultra power saver Xeon E3-1220L if you count its I/O Hub. According to Applied Micro, the X-Gene with 8 cores manages around 110 SPECint_rate2006. And with that it's supposed to be three times as fast as the Xeon E3-1220L, but, apparently, the head of engineering, Jim Johnston, got confused with the speed and rate values of this benchmark. Tests by Acer, Fujitsu and Supermicro rate the E3-1220L with its two Sandy Bridge cores at over 66 SPECint_rate2006, and according to Hoyle that's more than half of the X-Gene's score. But nevertheless, almost twice as fast at a better efficiency seems like a good starting point.

Those who are eager to start the ARMv8 programming will find the FPGA board is scheduled to become available at Applied Micro at the beginning of 2012. This will be particularly interesting for the new ARM-friend Microsoft, who possibly already counts on ARMv8 for the server version of Windows 8 [--] let's name it here Server 2012.

Besides, in order to be better prepared for many cores, Windows 8 and Server 2012 will get a new Taskmanager. At 128 and more cores, you can't see a thing with the old one. The overhaul is inspired by the HPC Clustermanager and now indicates the current load of the logical cores with colors and percentages. Windows 8 is supposed to support up to 640 cores. The current Windows Server 2008R2 is limited to 256 cores, which are managed here rather imperfectly. As reported by the c't magazine on various occasions, it suffers from a performance relevant bug in the NUMA allocation, which renders it almost unusable for applications in systems with more than one socket. And also the traditional administration via processor groups has, up to now, been implemented in a rather unfortunate way. Both matters, so Microsoft has previously announced, should be much improved in Windows 8/Server 2012 [--] and that's about time.

And Something Else

After numerous interviews with promising candidates, the board of Globalfoundries has finally acknowledged that the current interim boss Ajit Manocha is actually the best choice. And so, the 61-year-old veteran of the semiconductor scene has now been appointed official CEO. Straight away, Manocha announced a very ambitious goal; in the long run, Globalfoundries intends to become the market leader of contract manufacturers. For now, there'll be large obstacle to be overcome, namely TSMC, which is about four times as large.