Ula! - An Integrated DNN Acceleration Framework with Enhanced Unsupervised Learning Capability
In light of very recent revolutions of unsupervised learning algorithms (e.g., generative adversarial networks and dual-learning) and the emergence of their applications, three PIs/co-PI from Duke and UCSB form a team to design Ula! - an integrated DNN acceleration framework with enhanced unsupervised learning capability. The project revolutionizes the DNN research by introducing an integrated unsupervised learning computation framework with three vertically-integrated components from the aspects of software (algorithm), hardware (computing), and application (realization). The developed techniques are demonstrated and evaluated on three representative computing platforms: GPU, FPGA, and emerging nanoscale computing systems.
Cross-Platform Solutions for Pruning and Accelerating Neural Network Models
Deep neural networks (DNNs) have achieved remarkable success in many applications because of their powerful capability for data processing. The objective of this project is to investigate a software-hardware co-design methodology for DNN acceleration that can be applied to both traditional von Neumann and emerging neuromorphic architectures. The project fits into the general area of "brain-inspired" energy efficient computing paradigms that has been of much recent interest. From a more technical standpoint, a novel neural network sparsification process is to be explored to preserve the state-of-the-art accuracy, while establishing hardware-friendly models of neural network computations. The result is expected to lead to a holistic methodology composed of neural network model sparsification, hardware acceleration, and an integrated software/hardware co-design.
GAMBIT: Efficient Graph Processing on a Memristor-based Embedded Computing Platform
Recently, graph processing received intensive interests in light of a wide range of needs to understand relationships. Graph analytics are widely used in key domains in our society, such as cyber security, social media, infrastructure monitoring (e.g., smart building), natural language processing, system biology, recommendation systems. These important applications all fall into fast-growing sectors in computer science and engineering research. Unfortunately, the existing embedded systems equipped with conventional computing units like CPU/GPU cannot efficiently process large graphs in real time. Instead, large data centers are required to perform the graph processing, either incurring extra latency and energy due to data communication or only providing forensic (offline) graph analysis. This research aims to effectively enable graph analytics in embedded system with disruptive emerging technology.
SMALE: Enhancing Scalability of Machine Learning Algorithms on Extreme Scale Computing Platforms
Following technology advances in high performance computing and data acquisition, machine learning, especially deep learning, achieves remarkable success in many applications. This success, to a great extent, is enabled by introducing large-scale deep neural networks executed in parallel on extreme scale computing platforms which often are composed of a large number of computing unites. The network increment in depth and scale, however, greatly exacerbates the demand for computation resources and data storage of hardware platforms and therefore brings in significant technical challenges in deployment and execution. The objective of our research project is to develop a holistic innovation set to enhance the scalability of machine learning tasks on extreme scale computing platforms. Techniques at structure, assembly, and acceleration layers of machine learning algorithms will be investigated during the project period. These techniques will attack the fundamental problems in machine learning algorithms running on extreme scale computing platforms by vertically integrating the solutions at three closely entangled layers, paving the long-term scaling path of machine learning applications.