At the Microsoft Build conference on Monday, the company kicked off a new cloud offering that would provide machine learning resources to cloud customers using Intel FPGA-accelerated servers.
“I think this is a first step in making the FPGAs more of a general-purpose platform for customers,” said Mark Russinovich, chief technical officer for Microsoft’s Azure cloud computing platform. The technology is being offered as “preview,” which apparently means only a limited set of capabilities and allocations are available. Also, at this point, only customers with accounts in the East US 2 region will be able to access the platform.
This represents the commercialization of Microsoft’s Project Brainwave, an FPGA-based machine learning platform the company developed over the past year. The software-maker first announced the platform last August at the Hot Chips conference, saying it would be able to provide real-time AI for inferencing deep neural networks.
Project Brainwave board. Source: Microsoft
The use of FPGAs, in this case, Intel’s Stratix 10 devices, makes it possible to achieve extremely low latency for inferencing requests – even better than custom-built ASICs designed for machine learning, according to Microsoft. Just as important is the ability of the FPGAs to be reconfigured for different types of machine learning models – LSTMs, CNNs, GRUs, and so on. This flexibility makes it easier to accelerate the application based on the most optimal numerical precision and memory model being used. The reconfigurability is also a form of future-proofing, given that new machine learning techniques are being developed on a pretty regular basis.
The Project Brainwave technology employs a deep neural network processing engine that is loaded onto the FPGA, which is used to provide the basis for machine learning service. Application software is created via a Python compiler and runtime within the Azure Machine Learning SDK. Since inferencing is latency sensitive, rather than compute-demanding, the platform does away with batching (grouping multiple request together to maximize performance) in order to reduce response time as much as possible.
The Project Brainwave platform is not set up for training the neural net models, however. That is something done offline, most likely with the help of GPUs that are already available in Azure. In fact, Microsoft offers NVIDIA P40, K80, P100, and V100 GPUs at various price points for such work. Curiously though, for the cloud instances powered by the P100 and V100, which are NVIDIA’s most capable machine learning chips, Microsoft positions them as accelerators for more traditional HPC work rather than machine learning.
That might not be accidental. At last year’s Microsoft Build conference, Russinovich talked briefly about the work they are doing to move neural network training onto their FPGAs. As we reported on here, that suggests the company is actively looking for ways to reduce or eliminate its reliance on GPUs in this area. Microsoft has already more that an exaflop worth of FPGAs deployed across Azure, although it’s not known how much of that is based on the newer Stratix 10 gear.
The preview of the Project Brainwave offering, including instructions on how to apply for an allocation and set up the environment, can be accessed here.