- This event has passed.
ESE & Statistics Seminar: “Large Neural Networks: Insights from Linearized Models”
September 17 at 11:00 AM - 12:00 PM
Abstract: Modern machine learning models, and in particular multilayer neural networks, exhibit a broad range of puzzling phenomena. Their training requires to minimize a highly non-convex high-dimensional cost function, and yet it is efficiently addressed using simple gradient descent (GD) or stochastic gradient descent (SGD) algorithms. This model contains more parameters than the number of samples, and indeed they often are able to achieve zero training error, i.e. to perfectly interpolate or classify the training data. In fact, they can achieve zero training error even if the true labels are replaced by random ones. Despite this fact, they can generalize well beyond the training set. Finally, far from being a nuisance or limitation, this massive over parameterization appears to play an important role in explaining the power of these models.
I will discuss these phenomena, and how we can make sense of them by using some simple linear models. Finally, I will discuss the limitations of these `linear explanations’, and open challenges.
[Based on joint work with: Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, and with Ryan Tibshirani, Saharon Rosset, Trevor Hastie]
Professor of Electrical Engineering and Statistics, Stanford University
Andrea Montanari received a Laurea degree in Physics in 1997, and a Ph. D. in Theoretical Physics in 2001 (both from Scuola Normale Superiore in Pisa, Italy). He has been post-doctoral fellow at Laboratoire de Physique Théorique de l’Ecole Normale Supérieure (LPTENS), Paris, France, and the Mathematical Sciences Research Institute, Berkeley, USA. Since 2002 he is Chargé de Recherche (with Centre National de la Recherche Scientifique, CNRS) at LPTENS. In September 2006 he joined Stanford University as a faculty, and since 2015 he is Full Professor in the Departments of Electrical Engineering and Statistics.
He was co-awarded the ACM SIGMETRICS best paper award in 2008. He received the CNRS bronze medal for theoretical physics in 2006, the National Science Foundation CAREER award in 2008, the Okawa Foundation Research Grant in 2013, and the Applied Probability Society Best Publication Award in 2015. He is an Information Theory Society distinguished lecturer for 2015-2016. In 2016 he received the James L. Massey Research & Teaching Award of the Information Theory Society for young scholars. In 2018 he was an invited sectional speaker at the International Congress of Mathematicians.