Maryam Fazel, University of Washington

Flat Minima and Generalization in Deep Learning: from Matrix Sensing to Neural Nets
Date
Sep 24, 2024, 4:30 pm5:30 pm

Details

Event Description

Many behaviors empirically observed in deep neural networks lack satisfactory explanation. Consider the core question: When do overparameterized neural networks avoid overfitting and generalize to unseen data? Empirical evidence suggests that the shape of the training loss function near the solution matters---the minima where the loss is “flatter” tend to lead to better generalization. Yet quantifying flatness and its rigorous analysis, even in simple models, has remained elusive. 

In this talk, we examine well-known nonconvex models such as low-rank matrix recovery, matrix completion, robust PCA, and a 2-layer neural network as test cases. We prove that under standard statistical assumptions, "flat" minima (those with smallest local average curvature) are able to generalize in all these cases. These algorithm-agnostic results suggest a theoretical basis for favoring methods that bias iterates towards flat solutions, and help inform the design of better training algorithms.

Bio: Maryam Fazel is the Moorthy Family Professor of Electrical and Computer Engineering at the University of Washington, with adjunct appointments in Computer Science and Engineering, Mathematics, and Statistics. Maryam received a PhD from Stanford University and a BS from Sharif University of Technology in Iran. She is a recipient of the NSF Career Award, UWEE Outstanding Teaching Award, UAI conference Best Student Paper Award, and her paper on low-rank matrix estimation was selected by ScienceWatch as a Fast Breaking Paper (based on number of citations). She directs the Institute for Foundations of Data Science (IFDS), a multi-site NSF TRIPODS Institute. She serves on the program committee of ICML 2025, the Editorial board of the MOS-SIAM Book Series on Optimization, and as an Action Editor for Journal on Machine Learning Research. Her current research interests are in the area of optimization in machine learning and control.

Event Category
Distinguished Lecture Series
Optimization Seminar