Primer: Enforcing Lipschitz constraints for neural networks

Dept. of Computer Science, University of Toronto; Vector Institute

We can understand a lot about a neural network by understanding the Jacobian of the function it computes, i.e. the derivatives of its outputs with respect to its inputs. I’ll explain what the Jacobian is, how it’s built up from the Jacobians of individual layers, and what it tells us about neural net optimization. I’ll then motivate why we might like to bound the matrix norm of the Jacobian, or equivalently, enforce a small Lipschitz constant for a neural net, i.e. ensure that a small change to the input makes a correspondingly small change to the output. This is useful for several reasons: (1) it lets us make the predictions provably robust to small perturbations produced by an adversary, (2) it helps us to estimate the Wasserstein distance between probability distributions, (3) the generalization error can be bounded in terms of the Lipschitz constant, and (4) Lipschitz constraints prevent some optimization difficulties, most notably the problem of exploding gradients. To set the stage for the research talk, I’ll relate the Lipschitz bound of the network to the norms of individual layers’ Jacobians.

ӳ��ý