Modern machine learning models, such as neural networks, have a number of theoretically puzzling but empirically robust properties. Chief among them are: (a) neural networks are trained on datasets which are much smaller than the total number of model parameters; (b) training proceeds by empirical risk minimization via a first order method from a random starting point and, despite the non-convexity of the risk, typically returns a global minimizer; (c) this minimizer of the risk not only fits interpolates the data precisely but also performs well on unseen data (i.e. generalizes). The purpose of this talk is to introduce these fascinating properties and give some basic intuitions for why they might be possible. The emphasis will be on heuristics rather than on precise theorems.
Bio: Boris Hanin has been an Assistant Professor at Princeton ORFE since Fall 2020, and his research is on machine learning, probability, and mathematical physics. Prior to Princeton, he was an Assistant Professor in Mathematics at Texas A&M. He has also held visiting positions at Google, Facebook AI, and the Simons Institute.