About Waves and Cores
(Translation of the German original in c't 13/10 by Marcel Sieslack)
While a surprising wave of resignations washes CDU (Christian Democratic Union, Germany) politicians off their office chairs, oily waves arrive at the beaches of the Gulf of Mexico and German schlager fans celebrate the “la ola” atmosphere, Taipei was hit by a wave of tablets. Meanwhile, Intel announces processors with more than 50 cores for 2012.
The strong ground motion detected by the GEO600 detector of the Albert Einstein Institute for Gravitational Physics in Hanover on the last weekend of May did not originate from far-away supernovae. The tremors were caused by umpteen thousands of foot-stomping Lena fans – the German singer who won the Eurovision Song Contest – cheering in front of the New Town Hall. Germans had been waiting for those Grand Prix shockwaves for decades and for the gravitational waves almost a century now. The supercomputer Atlas is supposed to locate them in the detector’s data with the support of more than 250 000 participants of the internet project Einstein@home. On the Top500 list, Atlas only scores 255th place with 32.5 teraflops – but only, because the meanwhile heavily upgraded computer could not be linpacked yet. Also, the around 300 teraflops of average performance obtained by Einstein@home don’t count.
Some internet projects are more powerful still, first of all Folding@home. Its participants deliver more than 6 petaflops of average performance, which mainly comes from AMD and Nvidia graphic cards as well as Playstation 3 cells. However, we are talking about single precision floating point here. And the scientists responsible for Folding@home had to take some questionable steps to convert GPU flops into x86 flops. With matrix operations or with the Linpack benchmark this wouldn’t be necessary, but Folding@home employs more complex operations like exponentiation and logarithm. In single precision current GPUs can manage such calculations in very little time but in double precision they only deliver a fraction of their theoretical potential.
On the new Top500 list, there are three GPU accelerated supercomputers. Two of them use Nvidia’s new Tesla card C2050 with the Fermi chip while the third is equipped with the dual-GPU card AMD Radeon HD 4870 X2. With a little effort it’s possible to deduct the presumable CPU portion from the results, which allows a rough comparison of the Tesla C2050 and the Radeon HD 4870 X2: around 160 to 140 teraflops per card with the highly optimized Linpack benchmark in double precision. That’s only about 30 percent of the theoretical maximum performance.
Intel plans to beat AMD’s and Nvidia’s graphics chips at High Performance Computing (HPC) with the Larrabee – but, under a new name. At the Supercomputer Conference ISC’10, server boss Kirk Skaugen explained that the ex-Larrabee is now sailing under the flag of the “Many Integrated Core” (MIC) architecture. A 32-nm version named Aubrey Isle will be released as a developer sample with 32 cores, 8 MB of shared cache and a clock speed of 1.2 GHz. Thanks to quad hyperthreading each chip handles 128 threads quasi-parallelly. Apart from wafers with Aubrey Isles, Skaugen also presented the coprocessor card Knights Ferry, which Intel is already delivering to selected developers – like CERN, for instance. He also had first benchmark results, but only for single precision. For the important part of the Linpack benchmark, LU, Skaugen showily pushed the value to 517 gigaflops; the competition supposedly gets up to 360 gigaflops. The mass production of a 22-nm MIC chip with over 50 cores named Knights Corner is planned for some time 2010/2011.
Son of Larrabee: The application accelerator Knights Ferry comes with the Aubrey Isle chips on the displayed wafers.
At the Forschungszentrum Jülich (a research center in Jülich, Germany), Intel has founded the ExaCluster Laboratory in cooperation with the middle-ware company Partec to develop cluster technologies up to exascale systems. Bull is building the supercomputer TERA 100 with 18 000 Nehalem-EX processors, which is supposed to deliver more than 1 petaflop. HP and the Tokio Institute of Technology plan to equip the Tsubame 2.0 with really big nodes by October: two Westmere Xeons each, plus three Nvidia Tesla C2050 cards. The 1400 nodes presumably amount to a theoretical 2.4 petaflops – they are hoping for 1.5 petaflops Linpack performance.
A Wave of Tablets
With the computer fare Computex the monster wave of tablets initiated by the iPad start reached Taipei in the beginning of June. Already the day prior to the start of the Computex, Asus (Eee Pad, Eee Tablet) and MSI (WindPad 100 with Intel Atom, 110 with Nvidia Tegra 2) started the tablet tide and VIA showed off Chinese low-budget tablets with Android 1.6 and the rather slow ARM-SoC of its subsidiary Wondermedia. Amtek has designed tablets with the following processors: Freescale/ARM CPU, Nvidia Tegra 2, Atom and CULV Celeron. The One Laptop Per Child Association (OLPC) intents to launch its first tablet before the new XO-3 is released in 2012. Its tablet called Moby is slated for early 2011 and will be based on the Marvell reference design with a 1GHz ARM CPU. Meanwhile, Qualcomm has announced the first dual-core Snapdragon and Intel counters with the Moorestown version Oak Trail (especially designed for tablets) – from 2011. First, Intel will roll out DDR3 and dual-core Atoms.
Many ARM tablets use Android, but probably developers will have to wait for Android 3 to be able to make suitable software for devices with displays bigger than smartphone displays. At least Nvidia boss Jen-Hsun Huang expects the wider tablet application of his Tegra 2 under Android 3 – unfortunately he didn’t say if the latter is the “Gingerbread” Android expected toward the end of 2010 or if we’ll have to wait longer.
Also the Intel chipsets P67 and H67 (Cougar Point) for LGA1155 motherboards were on display at Computex. With their exhibits, Asrock, Biostar and other companies confirmed the speculations we made in our last issue: no integrated USB 3.0 adapter, but PCI Express 2.0 – and slowly but surely it’s “farewell PCI”