The SGI Altix 4000 series

Machine type RISC-based ccNUMA system.
Models Altix 4700.
Operating system Linux (SuSE SLES9/10, RedHat EL4) + extensions
Connection structure Fat Tree
Compilers Fortran 95, C, C++.
Vendors information Web page http://www.sgi.com/products/servers/altix/4000/
Year of introduction 2006

System parameters:

Model Altix 4700
Clock cycle 1.66 GHz
Theor. peak performance
Per proc. core(64-bits) 6.64 Gflop/s
Maximum (64-bits) 6.8 Tflop/s
Main memory
Memory/maximal ≤ 512 GB
No. of processors 4—512
Communication bandwidth
Point-to-point 3.2 GB/s
Aggregate peak/64 proc. frame 44.8 GB/s

Remarks:

The newest Altix version is the 4000 series succeeding the Altix 3700. Although the latter is still marketed we will not discuss it here as the functionality is largely the same. The difference is mainly in the support of the type of Intel Itanium processors and the communication network. The new Altix 4700 supports the dual-core Montecito processor with the new, faster 533 and 667 MHz frontside buses. Furthermore, where the model 3700 used NUMAlink3 for the connection of the processor boards, the Altix uses NUMAlink4 with twice the bandwidth at 3.2 GB/s, unidirectional. Also the structure of the processor boards has changed: instead of the so-called C-bricks with four Itanium 2 processors, 2 memory modules, two I/O ports, and two SHUBs (ASICs that connect processors, memory, I/O, and neighbouring processor boards), the Altix 4700 uses processor blades that houses 1 or 2 processors. SGI offers these two variants to accomodate different types of usage. The blades with 1 processor support the fastest frontside bus of 677 MHz thus giving a bandwidth of 10.7 GB/s to the processor on the blade. This processor blade is offered for bandwidth-hungry applications with irregular but massive memory access. The 2-processor blade, called the density option, uses the slower 533 MHz frontside bus for the processors and the slightly slower 1.6 GHz Montecito. The latter blade variant is assumed to satisfy a large part of the HPC users more cost-effectively.

The Altix is a ccNUMA system which means that the address space is shared between all processors (although it is physically distributed and therefore not uniformly accessible). In contrast to the Altix 3700 the bandwidth on the blades is as high as that of the off-board connections: NUMAlink4 technology is employed both on the blade and off-board.

SGI does not provide its own suite of compilers. Rather it distributes the Intel compilers for the Itanium processors. Also the operating system is Linux and not IRIX, SGI's former own Unix flavour although some additions are made to the standard Linux distributions, primarily for supporting SGI's MPI implementation and the CFXS file system.

Frames with 32 processor blades can be coupled with NUMAlink4 to form systems with a single-system image of at most 512 processors. So OpenMP programs with up to 512 processes can be run. On larger configurations, because Numalink allows remote addressing, one can apart from MPI also employ the Cray-style shmem library for one-sided communication.

Measured Performances:
At this moment no large Altix 4700 installations are operational yet (the first two being installed at the time of writing). For the Altix 3700 the are results. In the TOP 500 list, [45], a complex of 10160 Altix processors connected by Infiniband attained a speed of 51.78 Tflop/s solving a 1,290,240-order linear system. The efficiency for this complex is 85%.