Robust Networks: Neural Networks Robust to Quantization Noise and Analog Computation Noise Based on Natural Gradient
Description
Deep neural networks (DNNs) have had tremendous success in a variety of
statistical learning applications due to their vast expressive power. Most
applications run DNNs on the cloud on parallelized architectures. There is a need
for for efficient DNN inference on edge with low precision hardware and analog
accelerators. To make trained models more robust for this setting, quantization and
analog compute noise are modeled as weight space perturbations to DNNs and an
information theoretic regularization scheme is used to penalize the KL-divergence
between perturbed and unperturbed models. This regularizer has similarities to
both natural gradient descent and knowledge distillation, but has the advantage of
explicitly promoting the network to and a broader minimum that is robust to
weight space perturbations. In addition to the proposed regularization,
KL-divergence is directly minimized using knowledge distillation. Initial validation
on FashionMNIST and CIFAR10 shows that the information theoretic regularizer
and knowledge distillation outperform existing quantization schemes based on the
straight through estimator or L2 constrained quantization.
statistical learning applications due to their vast expressive power. Most
applications run DNNs on the cloud on parallelized architectures. There is a need
for for efficient DNN inference on edge with low precision hardware and analog
accelerators. To make trained models more robust for this setting, quantization and
analog compute noise are modeled as weight space perturbations to DNNs and an
information theoretic regularization scheme is used to penalize the KL-divergence
between perturbed and unperturbed models. This regularizer has similarities to
both natural gradient descent and knowledge distillation, but has the advantage of
explicitly promoting the network to and a broader minimum that is robust to
weight space perturbations. In addition to the proposed regularization,
KL-divergence is directly minimized using knowledge distillation. Initial validation
on FashionMNIST and CIFAR10 shows that the information theoretic regularizer
and knowledge distillation outperform existing quantization schemes based on the
straight through estimator or L2 constrained quantization.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
2019
Agent
- Author (aut): Kadambi, Pradyumna
- Thesis advisor (ths): Berisha, Visar
- Committee member: Dasarathy, Gautam
- Committee member: Seo, Jae-Sun
- Committee member: Cao, Yu
- Publisher (pbl): Arizona State University