UK Startup Takes On GPUs with Neural Network Accelerator

Oct. 31, 2016

By: Michael Feldman

AI startup Graphcore has emerged from stealth mode with the announcement of $30 million in initial Series A funding. The Bristol, UK-based company will use the cash infusion to complete development of its Intelligent Processing Unit (IPU), a custom-built chip aimed at machine learning workloads. The funding was led by Robert Bosch Venture Capital GmbH and Samsung Catalyst Fund; also joining were Amadeus Capital Partners, C4 Ventures, Draper Esprit plc, Foundation Capital and Pitango Venture Capital.

The IPU has been under development at Graphcore for two years, with the first product slated to be released in the second half of 2017. It’s designed to work across a range of machine learning application and is applicable to both training and inferencing neural networks. It’s broad application aperture is reflected in the company’s ambitions. Rather than trying to establish themselves in a niche within AI and working its way inward, Graphcore is setting its sights on the heart of the market, and its dominant silicon supplier, NVIDIA.

According to Graphcore CEO and cofounder Nigel Toon, the initial product, a dual-processor PCIe card, will take on the the most compute-intense training applications for machine learning. “We definitely see ourselves going head-to-head with people like NVIDIA and their GPU offerings and think the industry needs a specific product developed for this market rather than reusing something developed for graphics,” Toon says. He believes the company’s upcoming IPU silicon will offer a “very substantial advantage” when compared to the competition. 

How substantial? Toon predicts their IPU will be 5 to 10 times faster than NVIDIA’s recently announced Pascal Tesla GPUs for deep learning training and inferencing tasks. For machine learning codes where big vectors and large-scale data parallelism work less well, he thinks the IPU will outrun GPUs by a factor of 50 to 100. 

From his perspective, the uptake of GPUs in machine learning has led the industry down a certain path, which takes advantage of the dense vector architecture of the graphics processor. “But there are other techniques that are equally interesting and potentially very valuable that have been left a little bit behind,” Toon told TOP500 News. “That’s one of the things we’re looking to solve.” He says applications that use recurrent neural networks (RNNs) fall into this category. RNNs can be used for things like reinforcement learning, which develops intelligent behaviors based on a reward model.

Although the company is not talking much about the inner workings of the chip, Toon did say that the IPU maps better to recurrent structures than GPUs because it offers a richer model of parallelism and can process arbitrary graph structures more efficiently. AI research firm Tractica has noted the IPU can store the models in the processor itself rather than using external memory. This, they say, makes it especially adept at recurrent networks and their use in areas such as autonomous vehicles, natural language processing, and personalized medicine.

The initial PCIe card will house two IPUs, which are connected via a high-speed interconnect. Connecting multiple cards within a server is made possible with another fabric. According to Toon, each card will draw between 250 to 300 watts, which puts in the same thermal territory as an NVIDIA Pascal P100 or an Intel Knights Landing Xeon Phi. That effectively limits its usage to relatively modest-sized clusters used for training neural networks.

The company is also developing an appliance box based on these cards, which sounds analogous to NVIDIA’s DGX-1 system for the P100 GPU.  In addition, Graphcore is also in talks with hyperscalers and OEMs to get their accelerator card into servers manufactured and sold through the usual supply routes.

Although the initial deployments for the first products will largely be limited to training the neural networks, Toon says more specialized products for inferencing and applications at the edge of the network are on the company’s radar. These are likely to be lower power versions tweaked for the specific solution space. In that sense, it’s similar to NVIDIA’s AI strategy, which provides different GPU products for training, inferencing and embedded applications.

In advance of the initial product release, Graphcore intends to provide benchmark data to back up its performance claims. Assuming they can demonstrate those advantages, the company still has to overcome the considerable head start NVIDIA has in this market and the ecosystem advantages it has built. At least with regard to neural network training, NVIDIA has something north of 90 percent of the processor market, and is expanding its footprint in other areas like machine learning inferencing and autonomous vehicles.

That’s the crux of the challenge for Graphcore, or really for any company looking to insert a customized ASIC solution into a market dominated by commodity-driven technology. That task becomes even more challenging for a small company that has to bear the expensive burden of maintaining a processor roadmap than can keep pace with those of larger chipmakers. If there is a model on how to accomplish this, Graphcore can look to NVIDIA itself, which was able to outmaneuver Intel (at least for the time being) to become the dominant provider of machine learning silicon in the datacenter.

An even more encouraging example is what happened in the Bitcoin market, which, in a period of five years or so migrated from CPUs to GPUs to FPGAs to custom ASICs.  While AI obviously encompasses a much broader application set than Bitcoin, like the latter it represents a nascent market where application performance is the most critical driver. Whereas traditional high performance computing has largely abandoned proprietary processors due to its niche customer base, in the AI space, customized solutions remain an open question.

As Toon notes, since they started from a clean slate, they didn’t have to maintain compatibility with an architecture built for a total different purpose. Nor were they constricted by an application model that relied on legacy platforms. “What we’re trying to do is initially build a system that moves forward the state-of-the-art in machine learning and push the limits of what is possible,” he says.