Printable PDF
Department of Mathematics,
University of California San Diego

****************************

Math 278C: Optimization and Data Science

Prof. Tingting Tang

San Diego State University

On flat stationary points of deep neural networks

Abstract:

Understanding the loss landscape of the deep networks can provide many insights into the theoretical understanding of how the networks learn and why they work so well in practice. In this talk, starting with the observation that the flat minima correspond to continuous symmetries of the loss function, two symmetry breaking methods are proposed to provably remove all the flat minima (and flat stationary points) from the loss landscape for any deep feedforward network as long as the activation function is a smooth function. Examples of activation functions that satisfy the assumptions are sigmoid, hyperbolic tangent, softplus, polynomial, etc., and those of loss functions are cross-entropy, squared loss, etc. The methods can be essentially viewed as generalized regularizations of the loss function. The proposed methods are applied on the polynomial neural networks, where the activation function is a polynomial with arbitrary degree, and a first result on estimates of the number of isolated solutions is provided and we get a first glimpse on the complexity of the loss landscapes even in the absence of flat minima.

Host: Jiawang Nie

May 28, 2025

4:00 PM

APM 6402

Zoom option: ucsd.zoom.us/j/94146420185?pwd=NxQmWxd8bIadUB6bKaHFbzHSYZbqQ6.1
Meeting ID: 941 4642 0185
Password: 278C2025

****************************