Fujitsu Unveils Circuit Design that Optimizes Deep Learning Applications

April 24, 2017

By: Michael Feldman

Fujitsu Laboratories has developed a circuit technology that is said to improve the energy efficiency of deep learning workloads. According to the company, it plans to commercialize the technology in 2018 as part of its Human Centric AI Zinrai initiative.

The approach is based on the observation that the size of the data used for deep learning computation can often be significantly reduced depending on the nature of the input data used to train the neural networks. That saves compute time, the amount of memory needed to hold the learning results, and the additional cost of transmitting larger data items, all of which save energy. The technological approach and business rationale is summed up as follows:

“In a simulation of deep learning hardware incorporating this technology, Fujitsu Laboratories confirmed that it significantly improved energy efficiency, by about four times that of a 32-bit compute unit, in an example of deep learning using LeNet. With this technology, it has now become possible to expand the range of applicability for advanced AI using deep learning processing to a variety of locations, including servers in the cloud and edge servers.”

The reference to the 32-bit compute unit is in reference to the standard width of floating point data used for deep learning training. By reducing the data used in these computations to 16 bits, or even 8 bits, the number of calculations per watt can be increased dramatically. In addition, Fujitsu Laboratories’ has implemented the technology with integer, rather than floating point operations, further optimizing the computations. For example, Fujitsu Laboratories estimates that power consumption of the processor and memory subsystem can be reduced by 50 percent by paring down the data from 32 to 16 bits, and 75 percent when taking it down to 8 bits.

The tricky part is that not all neural networks are amenable to this sort of reduced precision, so trying to force 16-bit computation on a deep learning model that requires 32 bits will result in reduced accuracy. The key to this technology is being able to determine the sweet spot for the data size in real time during training.

To do this the circuitry hardware has a block devoted to analyzing the data being calculated, a database that stores the distribution of the analyzed data, and a block that preserves the calculation settings. During neural network training, the distribution of the analyzed data is used to compute the minimal bit width that can provide a reasonably accurate model.

For example, using LeNet, a convolution neural network used to recognize visual patterns, along with the MNIST dataset, the circuit technology was able to achieve a recognition accuracy of 98.31 percent using 8-bit data and 98.89 percent using 16-bit data. For comparison, the recognition accuracy is 98.90 percent when using 32-bit data, which for all practical purposes is identical to the 16-bit solution.

Additional details of the technology are being provided this week at xSIG 2017, the first cross-disciplinary Workshop on Computing Systems, Infrastructures, and Programming. The workshop is being held at the Toranomon Hills Forum in Minato-ku, Tokyo, from April 24-26.