By: Michael Feldman
Australia’s National Computational Infrastructure (NCI) has joined the OpenPOWER Foundation, the country’s first organization to do so. In conjunction with its new membership, NCI has purchased an IBM Power8 cluster, which will be used to run a range of compute-demanding analytics and simulation workloads.
NCI is a government-run organization that provides supercomputing capacity and expertise to Australia’s scientific research community. It collaborates with 31 universities, five science agencies, and several medical research institutes. Among these are the Commonwealth Scientific and Industrial Research Organisation (CSIRO), the Australian Bureau of Meteorology, Geoscience Australia, and the Australian National University (ANU), to name a few. NCI’s principle mission is to provide HPC resources and services for scientists, analysts, other domain specialists at these organizations. It currently serves more than 4,000 such researchers across the country.
The new IBM cluster is a four-node Power System S822LC for HPC machine, equipped with NVIDIA GPUs, presumably the new P100s. According to the press release, the system will be used “to underpin its research efforts through artificial intelligence, deep learning, high performance data analytics and other compute-heavy workloads.” Initially, the cluster will be used to run NCI’s top five GPU-based workloads to assess its performance. The S822LC is based on the Power8 processor, the only CPU that natively supports NVLink, enabling much faster CPU-to-GPU communications than would be possible in an x86-based server.
Technical features aside, the decision to purchase the Power cluster apparently hinged on having access to local IBM talent, in this case, the IBM Australia Development Laboratory (ADL) in Canberra. They develop the Power system firmware, and thus have a rather intimate knowledge of the inner workings of the architecture.
The new system represents the only IBM gear at NCI and the smallest cluster operated by the organization. The largest system there is a supercomputer known as Raijun, a Fujitsu Primergy x86 cluster that came online in 2013. It consists of 3,602 nodes powered by Intel Xeon “Sandy Bridge” CPUs, plus a few dozen (56) NVIDIA K80 Tesla GPUs.
Another NCI system, known as Tenjin, is designed especially for data-intensive work and is set up as a cloud resource. It’s a 100-node system, and, like Raijun, is based on Xeon “Sandy Bridge” CPUs. It uses 160 TB of SSD storage to speed up those data-intensive applications for which it was designed. Tenjin also offers 20 petabytes of parallel file system storage.
An older 96-node Fujitisu PrimeHPC FX10 cluster, with the name of Fujin, is also operated by NCI. It’s powered by Fujitsu’s SPARC64 IXfx processors and uses the company’s Tofu interconnect to glue the nodes together. The system was installed in 2013 in conjunction with a Fujitsu-NCI Collaboration Agreement. It’s used for various workloads, but particularly on projects involving collaborations with Fujitsu and RIKEN.
In September, an SGI cluster equipped with 32 Intel Xeon Phi “Knights Landing” processors was purchased. That system will be tasked with applications in computational physics, computational chemistry and climate research. NCI reports that early benchmarks are “very promising,” with a two-fold performance increase seen on some codes.
Then in November, NCI announced it was purchasing a Lenovo NeXtScale cluster, which will be installed as an extension to Rajin. The system will consist of nearly 23 thousand Intel Xeon “Broadwell” processors and 144 terabytes of memory. It is scheduled to be installed in January 2017 and will represent a 40 percent increase of NCI’s computational capacity.