Department of Mathematics,
University of California San Diego
****************************
Math 278C: Optimization and Data Science
Prof. Tingting Tang
San Diego State University
On flat stationary points of deep neural networks
Abstract:
Understanding the loss landscape of the deep networks can provide many insights into the theoretical understanding of how the networks learn and why they work so well in practice. In this talk, starting with the observation that the flat minima correspond to continuous symmetries of the loss function, two symmetry breaking methods are proposed to provably remove all the flat minima (and flat stationary points) from the loss landscape for any deep feedforward network as long as the activation function is a smooth function. Examples of activation functions that satisfy the assumptions are sigmoid, hyperbolic tangent, softplus, polynomial, etc., and those of loss functions are cross-entropy, squared loss, etc. The methods can be essentially viewed as generalized regularizations of the loss function. The proposed methods are applied on the polynomial neural networks, where the activation function is a polynomial with arbitrary degree, and a first result on estimates of the number of isolated solutions is provided and we get a first glimpse on the complexity of the loss landscapes even in the absence of flat minima.
Host: Jiawang Nie
May 28, 2025
4:00 PM
APM 6402
Zoom option: ucsd.zoom.us/j/94146420185?pwd
Meeting ID: 941 4642 0185
Password: 278C2025
****************************