Machine Learning & Deep Neural Network

Large Language Models

A survey: collaborative hardware and software design in the era of large-language-models

Spiking Neural Networks

prosperity: Accelerating Spiking Neural Networks via Product Sparsity

Federated Learning

LotteryFL: Personalized and Communication-Efficient Federated Learning with Lottery Ticket Hypothesis on Non-IID Datasets: Federated learning is a popular distributed machine learning paradigm with enhanced privacy. Its primary goal is learning a global model that offers good performance for the participants as many as possible. The technology is rapidly advancing with many unsolved challenges, among which statistical heterogeneity (i.e., non-IID) and communication efficiency are two critical ones that hinder the development of federated learning. In this work, we propose LotteryFL -- a personalized and communication-efficient federated learning framework via exploiting the Lottery Ticket hypothesis. In LotteryFL, each client learns a lottery ticket network (i.e., a subnetwork of the base model) by applying the Lottery Ticket hypothesis, and only these lottery networks will be communicated between the server and clients. Rather than learning a shared global model in classic federated learning, each client learns a personalized model via LotteryFL; the communication cost can be significantly reduced due to the compact size of lottery networks. To support the training and evaluation of our framework, we construct non-IID datasets based on MNIST, CIFAR-10 and EMNIST by taking feature distribution skew, label distribution skew and quantity skew into consideration. Experiments on these non-IID datasets demonstrate that LotteryFL significantly outperforms existing solutions in terms of personalization and communication cost.
Hermes: An Efficient Federated Learning Framework for Heterogeneous Mobile Clients: Federated learning (FL) has been a popular method to achieve distributed machine learning among numerous devices without sharing their data to a cloud server. FL aims to learn a shared global model with the participation of massive devices under the orchestration of a central server. However, mobile devices usually have limited communication bandwidth to transfer local updates to the central server. In addition, the data residing across devices is intrinsically statistically heterogeneous (i.e., non-IID data distribution). Learning a single global model may not work well for all devices participating in the FL under data heterogeneity. Such communication cost and data heterogeneity are two critical bottlenecks that hinder from applying FL in practice. Moreover, mobile devices usually have limited computational resources. Improving the inference efficiency of the learned model is critical to deploy deep learning applications on mobile devices. We present Hermes – a communication and inference-efficient FL framework under data heterogeneity. To this end, each device finds a small subnetwork by applying the structured pruning; only the updates of these subnetworks will be communicated between the server and the devices. Instead of taking the average over all parameters of all devices as conventional FL frameworks, the server performs the average on only overlapped parameters across each subnetwork. By applying Hermes, each device can learn a personalized and structured sparse deep neural network, which can run efficiently on devices. Experiment results show the remarkable advantages of Hermes over the status quo approaches. Hermes achieves as high as 32.17% increase in inference accuracy, 3.48× reduction on the communication cost, 1.83× speedup in inference efficiency, and 1.8× savings on energy consumption.
FedMask: Joint Computation and Communication-Efficient Personalized Federated Learning via Heterogeneous Masking: Federated learning (FL) is a distributed machine learning paradigm which allows for model training on de-centralized data residing on devices without breaching data privacy. However, the data residing across devices is intrinsically statistically heterogeneous (i.e., non-IID data distribution) and mobile devices usually have limited communication bandwidth to transfer local updates. Such statistical heterogeneity and communication bandwidth limit are two major bottlenecks that hinder applying FL in practice. In addition, considering mobile devices usually have limited computational resources, improving computation efficiency of training and running DNNs is critical to developing on-device deep learning applications. We present FedMask – a communication and computation efficient FL framework. By applying FedMask, each device can learn a personalized and structured sparse DNN, which can run efficiently on devices. To achieve this, each device learns a sparse binary mask (i.e., 1 bit per network parameter) while keeping the parameters of each local model unchanged; only these binary masks will be communicated between the server and the devices. Instead of learning a shared global model in classic FL, each device obtains a personalized and structured sparse model that is composed by applying the learned binary mask to the fixed parameters of the local model. Our experiments show that compared with status quo approaches, FedMask improves the inference accuracy by 28.47% and reduces the communication cost and the computation cost by 34.48× and 2.44×. FedMask also achieves 1.56× inference speedup and reduces the energy consumption by 1.78×.
Soteria: Provable Defense against Privacy Leakage in Federated Learning from Representation Perspective: Federated learning (FL) is a popular distributed learning framework that can reduce privacy risks by not explicitly sharing private data. However, recent works have demonstrated that sharing model updates makes FL vulnerable to inference attack. In this work, we show our key observation that the data representation leakage from gradients is the essential cause of privacy leakage in FL. We also provide an analysis of this observation to explain how the data presentation is leaked. Based on this observation, we propose a defense called Soteria against model inversion attack in FL. The key idea of our defense is learning to perturb data representation such that the quality of the reconstructed data is severely degraded, while FL performance is maintained. In addition, we derive a certified robustness guarantee to FL and a convergence guarantee to FedAvg, after applying our defense. To evaluate our defense, we conduct experiments on MNIST and CIFAR10 for defending against the DLG attack and GS attack. Without sacrificing accuracy, the results demonstrate that our proposed defense can increase the mean squared error between the reconstructed data and the raw data by as much as 160× for both DLG attack and GS attack, compared with baseline defense methods. Therefore, the privacy of the FL system is significantly improved.
FL-WBC: Enhancing Robustness against Model Poisoning Attacks in Federated Learning from a Client Perspective: Federated learning (FL) is a popular distributed learning framework that trains a global model through iterative communications between a central server and edge devices. Recent works have demonstrated that FL is vulnerable to model poisoning attacks. Several server-based defense approaches (e.g. robust aggregation) have been proposed to mitigate such attacks. However, we empirically show that under extremely strong attacks, these defensive methods fail to guarantee the robustness of FL. More importantly, we observe that as long as the global model is polluted, the impact of attacks on the global model will remain in subsequent rounds even if there are no subsequent attacks. In this work, we propose a client-based defense, named White Blood Cell for Federated Learning (FL-WBC), which can mitigate model poisoning attacks that have already polluted the global model. The key idea of FL-WBC is to identify the parameter space where long-lasting attack effect on parameters resides and perturb that space during local training. Furthermore, we derive a certified robustness guarantee against model poisoning attacks and a convergence guarantee to FedAvg after applying our FL-WBC. We conduct experiments on FasionMNIST and CIFAR10 to evaluate the defense against state-of-the-art model poisoning attacks. The results demon- strate that our method can effectively mitigate model poisoning attack impact on the global model within 5 communication rounds with nearly no accuracy drop under both IID and non-IID settings. Our defense is also complementary to existing server-based robust aggregation approaches and can further improve the robustness of FL under extremely strong attacks.

Previous Publications

Ula! - An Integrated DNN Acceleration Framework with Enhanced Unsupervised Learning Capability: In light of very recent revolutions of unsupervised learning algorithms (e.g., generative adversarial networks and dual-learning) and the emergence of their applications, three PIs/co-PI from Duke and UCSB form a team to design Ula! - an integrated DNN acceleration framework with enhanced unsupervised learning capability. The project revolutionizes the DNN research by introducing an integrated unsupervised learning computation framework with three vertically-integrated components from the aspects of software (algorithm), hardware (computing), and application (realization). The developed techniques are demonstrated and evaluated on three representative computing platforms: GPU, FPGA, and emerging nanoscale computing systems.
Cross-Platform Solutions for Pruning and Accelerating Neural Network Models: Deep neural networks (DNNs) have achieved remarkable success in many applications because of their powerful capability for data processing. The objective of this project is to investigate a software-hardware co-design methodology for DNN acceleration that can be applied to both traditional von Neumann and emerging neuromorphic architectures. The project fits into the general area of "brain-inspired" energy efficient computing paradigms that has been of much recent interest. From a more technical standpoint, a novel neural network sparsification process is to be explored to preserve the state-of-the-art accuracy, while establishing hardware-friendly models of neural network computations. The result is expected to lead to a holistic methodology composed of neural network model sparsification, hardware acceleration, and an integrated software/hardware co-design.
GAMBIT: Efficient Graph Processing on a Memristor-based Embedded Computing Platform: Recently, graph processing received intensive interests in light of a wide range of needs to understand relationships. Graph analytics are widely used in key domains in our society, such as cyber security, social media, infrastructure monitoring (e.g., smart building), natural language processing, system biology, recommendation systems. These important applications all fall into fast-growing sectors in computer science and engineering research. Unfortunately, the existing embedded systems equipped with conventional computing units like CPU/GPU cannot efficiently process large graphs in real time. Instead, large data centers are required to perform the graph processing, either incurring extra latency and energy due to data communication or only providing forensic (offline) graph analysis. This research aims to effectively enable graph analytics in embedded system with disruptive emerging technology.
SMALE: Enhancing Scalability of Machine Learning Algorithms on Extreme Scale Computing Platforms: Following technology advances in high performance computing and data acquisition, machine learning, especially deep learning, achieves remarkable success in many applications. This success, to a great extent, is enabled by introducing large-scale deep neural networks executed in parallel on extreme scale computing platforms which often are composed of a large number of computing unites. The network increment in depth and scale, however, greatly exacerbates the demand for computation resources and data storage of hardware platforms and therefore brings in significant technical challenges in deployment and execution. The objective of our research project is to develop a holistic innovation set to enhance the scalability of machine learning tasks on extreme scale computing platforms. Techniques at structure, assembly, and acceleration layers of machine learning algorithms will be investigated during the project period. These techniques will attack the fundamental problems in machine learning algorithms running on extreme scale computing platforms by vertically integrating the solutions at three closely entangled layers, paving the long-term scaling path of machine learning applications.