About Chaps and Engravers

|

Andreas Stiller

(Translation of the German original in c't by Marcel Sieslack)

While AMD shows off numerous innovations in the interaction of CPUs and GPUs at its own developer forum, Intel presents some interesting features of the processor generation after next, Haswell: AVX2.

No, AMD didn't introduce a new CEO at the start of the Fusion Developer Summit (AFDS) in Bellevue near Seattle – as had been expected by some. Instead, it officially launched the A-Series that had previously been known under the codename Llano. In what could be interpreted as a slight against the new chips (with the old Phenom core), vice president Rick Bergman already presented a laptop with a prototype of the successor called Trinity, which is scheduled to roll out with Bulldozer architecture, a DirectX 11 graphics core and 50 percent more speed next year.

Software architect Phil Rogers didn't waste any time on pretty words – as is customary – in his keynote and went straight to the heart of the Fusion System Architecture (FSA) and the AMD Accelerated Parallel Processing SDK (APP), formerly known as FireStream SDK. Rogers even used code examples to explain hybrid computing; it's a real developers conference, after all. Unified memory (CPU and GPU in one address space), parallel kernels, user mode scheduling and many other, similar topics were mentioned as current and future features – exciting stuff, but we heard about it from the competition before. However, thanks to Fusion, AMD can directly and quickly bring the hybrid concept to its hardware. Nvidia, on the other hand, has to deal with PCIe buses or links that slow things down because it doesn't have a proprietary x86 CPU and Tegra's graphics module still lacks unified shaders. Not even the new quad-core Tegra 3 (Kal-El), which Microsoft was allowed to use as platform for Windows 8, will come with that feature. For Windows 8, the unified shaders are not essential while there are certain minimum requirements concerning the ARM instruction set, the display size and so on. Due to the strong fragmentation of the ARM market, Microsoft will have to manage various binary-incompatible versions of "WARM", for Nvidia, Qualcomm, TI etc. At least, that's what Intel's software boss, Rene James, pointed out as a disadvantage – alongside the lack of backwards compatibility for old applications – at an investors conference in mid-May. Microsoft's quick and strongly worded, yet rather fuzzy, response – at least partially – disclaimed her assumptions: as everything was still at a technology demonstration stage, no final details could be given. Things seem to be a bit tense between the two companies at the moment.

The successor of the Sandy Bridge, the Ivy Bridge, expected at the end of this year or the beginning of the next, will only bring minimal changes to the microarchitecture – as is usual for Intel's "tick" step –, but it will come in 22 nm and with the new tridimensional Tri-Gate transistors. According to Mark Bohr, Director of Process Architecture and Integration, the whole processor – instead of just the caches, as had previously been assumed here – will work with these Tri-Gate transistors. The instruction set, however, will remain mostly unchanged, with minor performance upgrades to the transcendental and crypto instructions as well as the security extension SMEP described in the previous issue of Processor Whispers. It still won't offer any fused-multiply-add instructions (FMA), though, like the ones that have meanwhile been specified in different versions for AVX and the ones that its competitor from AMD, Bulldozer, features in different flavours.

Proudly, Rick Bergman presents a working prototype of the Trinity processor with Bulldozer architecture as successor of the just-released Llano.

AVX2 Tock

Finally, at the beginning of 2013, the next "tock" step is supposed to come to life in the form of the Haswell processor, which will again be designed by the team around Ronak Singhal in Oregon that might revive further technologies from the abandoned Netburst architecture. There are also rumors about a completely new cache design, a comparatively short pipeline of 14 steps, new power saving mechanisms and a probably optional vector unit that works with 512 bits of width and speaks LNI: Larrabee New Instructions.

For now, the only thing that's sure is that Haswell will significantly extend the current AVX to AVX2; on a larger scale than anticipated. Here, you'll finally find the as yet missing FMA instructions, even if only in the stripped three-operand version. In order to enable developers to get started already, Intel has extended the programming reference for AVX accordingly and will surely follow up with an AVX2 emulator soon. Up to now, AVX for 256 bit was restricted to floating point but with AVX2 almost all of the 128 bit SIMD operations will be extended to 256 bit, including the integer operations. Additionally, there'll be highly performant "cross-lane" permutation operations: in a 256-bit register, a mask determines how the bits from a second register are to be transferred to the target register. However, the new "gather" operations that optimize the memory access to non-sequence data should be even more significant. Together with FMA this should allow for new performance records, for instance, with matrix calculations in the Linpack benchmark.

Furthermore, there are some interesting extensions to the common instruction set (VEX-encoded GPR instructions). To what extent Haswell will offer all these instructions from the beginning on is still unclear. As usual, there are numerous bits in the CPUID that provide information about the existence or non-existence of specific instructions. In any case, the Haswell successors have already been named: Broadwell with the shrink to 14 nm (P1272) and then Skylake with a new microarchitecture – probably again from Haifa –, the shrunk 11 nm version (p1274) of which is scheduled to roll out as Skymont in 2015.

Actually, according to old roadmaps, Intel had planned the switch to EUV lithography for the 32 nm process already, but it seems the company now intends to use some kind of legerdemain to go down all the way to 11 nm with boring 193 nm laser light – that's like using a sledgehammer and stone chisel for engraving.