Title | BitSystolic: A 26.7 TOPS/W 2b8b NPU with Configurable Data Flows for Edge Devices |
Publication Type | Journal Article |
Year of Publication | 2021 |
Authors | Q Yang, and H Li |
Journal | Ieee Transactions on Circuits and Systems I: Regular Papers |
Volume | 68 |
Start Page | 1134 |
Issue | 3 |
Pagination | 1134 - 1145 |
Date Published | 03/2021 |
Abstract | Efficient deployment of deep neural networks (DNNs) emerges with the exploding demand for artificial intelligence on edge devices. Mixed-precision inference with both compressed model and reduced computation cost enlightens a way for accurate and efficient DNN deployments. Despite obtaining mixed-precision DNN models at the algorithmic level, there still lacks sufficient hardware support. In this work, we propose BitSystolic, a neural processing unit based on a systolic array structure. In BitSystolic, the numerical precision of both weights and activations can be configured in the range of 2-8 bits, fulfilling different requirements across mixed-precision models and tasks. Moreover, BitSystolic can support various data flows presented in different types of neural layers (e.g., convolution, fully-connected, and recurrent neural layers) and adaptive optimization of data reuse by switching between the matrix-matrix mode and vector-matrix mode. We designed and fabricated the proposed BitSystolic composed of a 16\times 16 systolic array. Our measurement results show that BitSystolic features the unified power efficiency of up to 26.7 TOPS/W with 17.8 mW peak power consumption across various layer types. |
DOI | 10.1109/TCSI.2020.3043778 |
Short Title | Ieee Transactions on Circuits and Systems I: Regular Papers |