News

Azure Boosts HPC, AI Appeal with CycleCloud and NVIDIA GPU Containers

None
Sept. 3, 2018

By: Michael Feldman

Leveraging technology from Cycle Computing and NVIDIA, Microsoft is continuing to add Azure capabilities features that will draw in HPC and AI customers.

Just a year after its acquisition of Cycle Computing, Microsoft has folded the cloud orchestration technology into CycleCloud, an Azure tool for building and managing HPC clusters. On August 29, Microsoft announced its general availability.

Azure customers can now run HPC applications under a wide array of popular schedulers, namely SLURM, PBS Pro, Grid Engine, LSF, HPC Pack, or HTCondor. Like any mature orchestration tool, CycleCloud can manage data transfers between the cloud and on-premise storage, monitor performance, track cluster usage, and generate reports that encapsulates any of these activities. The technology can also dynamically autoscale clusters based on things like time constraints, job load and hardware availability.

Along with the CycleCloud launch, Microsoft revealed some its early customers. These include household names like GE and Johnson & Johnson, as well as lesser known companies like Ramboll, a Danish-based engineering consulting group, and Silicon Therapeutics, a drug discovery software specialist.

The use case for Silicon Therapeutics is a quantum physics simulation used to identify drug targets associated with certain diseases. In this case, the application was able to run molecular dynamics simulations on thousands of proteins targets entailing tens of thousands of atoms.

Using SLURM as the scheduler, the computations ran for 20 hours on over two thousand NVIDIA K80 GPU instances, which represented five years of compute time. According to the Microsoft announcement, the drug target search produced over 50 terabytes of data and used 25 GB/second of bandwidth between the compute nodes and the BeeGFS storage system. By spinning up a SLURM-powered cluster on Azure, Silicon Therapeutics could use the same software platform as on its in-house cluster, but could scale up the hardware as needed.

In conjunction with CycleCloud’s general availability, Microsoft also unveiled Azure support for the NVIDIA GPU Cloud (NGC). That means customers can use NGC’s shrink-wrapped containers for V100 or P100 GPU applications on Azure NCv3, NCv2 and ND instances. These include containers for deep learning software like TensorFlow, PyTorch, Microsoft Cognitive Toolkit, NVIDIA TensorRT, as well those for HPC software packages, like NAMD, Gromacs, LAMMPS, and ParaView.

The nice thing here is that NVIDIA takes care of all the software updates and performance optimizations for the NGC container software bits -- drivers, support libraries, GPU features dependencies, and such. And NVIDIA does this on a regular basis, so customers get the most up-to-date environment for their applications.

Microsoft’s strategy here is pretty clear: they want to be the go-to cloud provider for HPC and AI, two of the most computationally-intense types of applications. Which makes a lot of sense for a cloud provider considering that these kinds of applications ensure customers will be renting CPU and GPU resources by the boatload. Plus, both application sets – but especially AI – are experiencing fast growth rates.

For the time being, Microsoft is letting users try CycleCloud for free. Customers can get also sign up for an NGC account, at which point they will be able to download the HPC and deep learning containers at no charge.

Image source: NVIDIA