Stochastic gradient updates yield deep equilibrium kernels

Russell Tsuchida · Cheng Soon Ong

Video

Paper PDF

Thumbnail of paper pages

Abstract

Implicit deep learning allows one to compute with implicitly defined features, for example features that solve optimisation problems. We consider the problem of computing with implicitly defined features in a kernel regime. We call such a kernel a deep equilibrium kernel (DEKer). Specialising on a stochastic gradient descent (SGD) update rule applied to features (not weights) in a latent variable model, we find an exact deterministic update rule for the (DEKer) in a high dimensional limit. This derived update rule resembles previously introduced infinitely wide neural network kernels. To perform our analysis, we describe an alternative parameterisation of the link function of exponential families, a result that may be of independent interest. This new parameterisation allows us to draw new connections between a statistician's inverse link function and a machine learner's activation function. We describe an interesting property of SGD in this high dimensional limit: even though individual iterates are random vectors, inner products of any two iterates are deterministic, and can converge to a unique fixed point as the number of iterates increases. We find that the (DEKer) empirically outperforms related neural network kernels on a series of benchmarks.