Printable PDF
Department of Mathematics,
University of California San Diego

****************************

Mathematics of Information, Data, and Signals Seminar

Boris Hanin

Princeton University

Random Fully Connected Neural Networks as Perturbatively Solvable Models

Abstract:

Fully connected networks are roughly described by two structural parameters: a depth L and a width n. It is well known that, with some important caveats on the scale at initialization, in the regime of fixed L and the limit of infinite n, neural networks at the start of training are a free (i.e. Gaussian) field and that network optimization is kernel regression for the so-called neural tangent kernel (NTK). This is a striking and insightful simplification of infinitely overparameterized networks. However, in this particular infinite width limit neural networks cannot learn data-dependent features, which is perhaps their most important empirical feature. To understand feature learning one must therefore study networks at finite width. In this talk I will do just that. I will report on recent work joint with Dan Roberts and Sho Yaida (done at a physics level of rigor) and some more mathematical ongoing work which allows one to compute, perturbatively in 1/n and recursively in L, all correlation functions of the neural network function (and its derivatives) at initialization. An important upshot is the emergence of L/n, instead of simply L, as the effective network depth. This cut-off parameter provably measures the extent of feature learning and the distance at initialization to the large n free theory.

March 3, 2022

11:30 AM

https://msu.zoom.us/j/96421373881

(the passcode is the first prime number > 100)

****************************