In-Network Accumulation: Extending the Role of NoC for DNN Acceleration
Network-on-Chip (NoC) plays a significant role in the performance of a DNN accelerator. The scalability and modular design property of the NoC help in improving the performance of a DNN execution by providing flexibility in running different kinds of workloads. Data movement in a DNN workload is still a challenging task for DNN accelerators and hence a novel approach is required. In this paper, we propose the In-Network Accumulation (INA) method to further accelerate a DNN workload execution on a many-core spatial DNN accelerator for the Weight Stationary (WS) dataflow model. The INA method expands the router's function to support partial sum accumulation. This method avoids the overhead of injecting and ejecting an incoming partial sum to the local processing element. The simulation results on AlexNet, ResNet-50, and VGG-16 workloads show that the proposed INA method achieves 1.22x improvement in latency and 2.16x improvement in power consumption on the WS dataflow model.
READ FULL TEXT