Deep Learning Processor

Deep neural networks, such as convolutional neural networks (CNN) and recurrent neural networks (RNN), are the promising machine-learning approach that can perform recognition of data in a variety of forms such as image, sound, temperature, and even heart beats with the state-of-the-art accuracy. For example, recent convolutional neural networks can recognize objects from raw images with the top-5 accuracy up to 96%, even better than human. Hence, this versatility and high accuracy of deep neural network makes it a suitable candidate for the brain-like intelligent processor in the upcoming smart world.

However, a deep neural network usually consists of a large number of layers and weights, and therefore requires a massive amount of computation and memory bandwidth. As a result, the substantial power and computational cost of running deep neural networks has created a need for a specialized processor and brand new computational paradigms to enable deep neural networks to be applied on practical applications in a mobile environment.

We design an energy-efficient deep neural network processor for the intelligent workloads on mobile devices such as smartphones, wearble devices, and IoT devices. It generally integrates an array of parallel processing cores to process key operations of deep neural networks including dot-products, pooling, and activaton. In order to increase energy efficiency, an intensive research on datapath blocks and processing architectures is performed. Also, to manage the massive amount of memory requirements from the deep neural network, highly-optimized memory architectures, efficient data representation and compression method are studied.