VWA: Hardware Efficient Vectorwise Accelerator for Convolutional Neural Network

05/02/2022
by   Kuo-Wei Chang, et al.
0

Hardware accelerators for convolution neural networks (CNNs) enable real-time applications of artificial intelligence technology. However, most of the existing designs suffer from low hardware utilization or high area cost due to complex dataflow. This paper proposes a hardware efficient vectorwise CNN accelerator that adopts a 3×3 filter optimized systolic array using 1-D broadcast dataflow to generate partial sum. This enables easy reconfiguration for different kinds of kernels with interleaved input or elementwise input dataflow. This simple and regular data flow results in low area cost while attains high hardware utilization. The presented design achieves 99%, 97%, 93.7%, 94% hardware utilization for VGG-16, ResNet-34, GoogLeNet, and Mobilenet, respectively. Hardware implementation with TSMC 40nm technology takes 266.9K NAND gate count and 191KB SRAM to support 168GOPS throughput and consumes only 154.98mW when running at 500MHz operating frequency, which has superior area and power efficiency than other designs.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset