Details
Abstract: Non-convex optimization is the main algorithmic technique behind many state-of-art results in machine learning. It has been conjectured that the landscape of many training objectives has the nice geometric property that “all local minima are global minima,” and thus admits efficient optimization algorithms.
In the first part of the talk, I will show that the optimization landscape of matrix completion — a famous problem in ML with wide applications in recommender system and collaborative filtering — does have this property. This implies that (stochastic) gradient descent from arbitrary initialization can solve matrix completion in polynomial time.
Next, we will discuss linear residual networks, as a simplified model towards the first-cut understanding of residual networks. We will give a strikingly simple proof that arbitrarily deep linear residual networks have no spurious critical points (=points with vanishing gradient that are not global minima). In contrast, the landscape of standard linear neural networks does have spurious critical points. This demonstrates that re-parameterization using the identity shortcut connection can make the optimization easier.
Based on joint works with Rong Ge and Jason D. Lee, and with Moritz Hardt.
Bio: Tengyu Ma is currently a fifth-year graduate student at Princeton University, advised by Prof. Sanjeev Arora. He won the Simons Award for Graduate Students in Theoretical Computer Science, IBM Ph.D. Fellowship, Princeton Honorific Fellowship, Siebel Scholarship and NIPS'16 best student paper award.
Ma’s research papers contribute to the areas and topics including non-convex optimization, deep learning, natural language processing, distributed optimization, convex relaxation (e.g. sum of squares hierarchy).