About Haskell and Haswell

|

Andreas Stiller

(Translation of the German original in c't by Marcel Sieslack)

It's the “worst-kept secret of the industry” – so it was said at Supercomputing 2011 (SC11) – that Intel's Haswell processor will feature transactional memory. Other leaked bits of news concern Intel's Ivy Bridge and AMD's Trinity.

About two years ago, in August 2009, Intel, IBM and Sun founded a “Drafting Group” in order to devise a common specification for transactional memory (TM). All three of them were planning to incorporate this feature into their next processor generations. Sun intended to do so with its Rock processor, but it was dumped after the acquisition by Oracle one year later. IBM, on the contrary, was more successful with its Blue Gene/Q. At SC11 in Seattle, the first processor with hardware transactional memory (HTM) was presented under the official name PowerPC A2.

At the beginning of 2013, Intel will offer HTM, too, with its Haswell processor. Everyone from Intel openly admitted as much when asked directly at the SC11. Supposedly, Intel will soon announce the new TM instructions that will be added to the already released AVX2 extension. It's about time, as the continuously increasing number of processor cores makes technologies for faster thread synchronization more and more urgent. Without such technologies, the processor will eventually be so busy with itself that it won't be able to get any real work done.

With transactional memory, the idea is to not lose time by locking successive data accesses by threads to shared memory areas, but instead first bundle the accesses in an atomic transaction, for instance in the L1 cache, in order to save time by executing them all at once during the commit. This happens under the optimistic assumption that no other thread will stick its oar in and access the shared memory in the meantime. If that happens, though, that's bad luck and a rollback mechanism is required to abort the intended - but by then invalid – transaction. In that case, the transaction is re-executed, if applicable with new source data.

Software-wise, Intel has been strongly dedicated to software transactional memory (STM) for years already and it has been grooming the Intel C++ STM Compiler, Prototype Edition. In this compiler, __TM_atomic{} can be used to mark passages that are to be treated as atomic transactions.

Also some other compilers and interpreters have committed themselves to STM at an early stage, rather aggressively, for instance, the functional programming language Haskell. The Glasgow Haskell Compiler implemented it in version 6.4 and some applications based on it (like some Bittorrent clients) actually make ample use of it. According implementations for Java and Python are eagerly being worked on as well.

However, the whole thing stands and falls with the respective conflict rate and with the time necessary for conflict detection and rollback. Just software-wise, TM mostly isn't efficient enough, but it can be supported, complemented or replaced by various hardware mechanisms that significantly increase its efficiency.

Before Haswell's planned release in 2013, Intel will first roll out the Xeon E5 (Sandy Bridge EP). As for desktop PCs and Notebooks, the Ivy Bridge in 22-nm technology is expected. According to Intel's leaked NDA Desktop Platform Roadmap WW46, the wait won't be over until the second quarter of 2012, though. Also according to this roadmap, at 77 watts TDP, the new 22-nm desktop processors of the normal energy class will consume about 20 percent less power, but there are no versions with a higher nominal clock rate than the currently highest one.

Leaks

The Core i5 lies between 3.0/3.2 GHz (i5-3300) and 3.4/3.9 GHz (i5-3570) with four cores without HT, 4 MB cache, two memory channels DDR3-1333/1600 and comes with integrated DirectX 11 capable HD 2500 or HD 4000 graphics. The top processor Core i7-3770 features HD 4000, 8 MB cache, hyperthreading and 3.4/3.9 GHz of clock rate – a bit more in the overclockable K version. Besides, there are stripped down low-power versions from 65 down to 35 watts.

Intel also added some benchmark results for the i7-3770 in comparison to the Sandy Bridge i7-2600, with a nominally equal clock speed. The increase in graphics performance, factor 2.7 to 3 in 3DMark, looks impressive, but Intel pitted an HD 4000 against a hardly comparable HD 2000. In comparison to an HD 3000, the performance should be be no more than twice as high.

Thanks to small architectural improvements and an optimized turbo-boost, probably as well as faster memory, the CPU benchmarks improved only a little, between 7 percent (Sysmark 2012) and 25 percent (Excel 2010) in the best case.

Does AMD keep some secret plans for transactional memory in this safe drawer? Margaret Lewis (Product Marketing Director) and Pat Patla (General Manager of Server Products) at SC11.

AMD hasn't announced anything about transactional memory so far, but, years ago, it had already presented a possible architecture extension called Advanced Synchronization Facility (ASF), which is able to lock complete cache lines and thus offers a strongly improved basis for STM. However, until now, there's only a simulator for PTLsim.

But not only that is unclear, but also many other things at AMD, where the new boss Rory Reads appears to be sweeping with an iron broom. The mass layoffs also heavily effected Germany in particular: 20 of 80 employees working at the office in Munich have been fired, among them almost the complete PR crew. And with its manufacturer Globalfoundries, AMD has been quarreling for quite some time now. Manufacturing problems concerning the Llano at the beginning of the year are supposed to have caused the planned McAir deal with Apple to fall apart. And things don't look good for the Bobcat successors Witchia and Krishna. Word is that AMD has canceled them or is planning to take the 28-nm APU production away from Globalfoundries completely and put it in the hands of TSMC. In any case, Globalfoundries has put the plans for another fab in Abu Dhabi on ice for now.

Although first leaked benchmark results for the Llano successor Trinity with the new Piledriver core can't really impress with a performance increase of 23 to 35 percent in graphics (3DMark Vantage) and 7 to 17 percent in general (PCMark Vantage), but at least the graphics performance should be more than sufficient to keep the Ivy Bridge in check in this context.