|Title||BitSystolic: A 26.7 TOPS/W 2b8b NPU with Configurable Data Flows for Edge Devices|
|Publication Type||Journal Article|
|Year of Publication||2021|
|Authors||Q Yang, and H Li|
|Journal||Ieee Transactions on Circuits and Systems I: Regular Papers|
|Pagination||1134 - 1145|
Efficient deployment of deep neural networks (DNNs) emerges with the exploding demand for artificial intelligence on edge devices. Mixed-precision inference with both compressed model and reduced computation cost enlightens a way for accurate and efficient DNN deployments. Despite obtaining mixed-precision DNN models at the algorithmic level, there still lacks sufficient hardware support. In this work, we propose BitSystolic, a neural processing unit based on a systolic array structure. In BitSystolic, the numerical precision of both weights and activations can be configured in the range of 2-8 bits, fulfilling different requirements across mixed-precision models and tasks. Moreover, BitSystolic can support various data flows presented in different types of neural layers (e.g., convolution, fully-connected, and recurrent neural layers) and adaptive optimization of data reuse by switching between the matrix-matrix mode and vector-matrix mode. We designed and fabricated the proposed BitSystolic composed of a 16\times 16 systolic array. Our measurement results show that BitSystolic features the unified power efficiency of up to 26.7 TOPS/W with 17.8 mW peak power consumption across various layer types.
|Short Title||Ieee Transactions on Circuits and Systems I: Regular Papers|