- This event has passed.
ASSET Seminar: “When do spectral gradient updates help in deep learning?”
February 4 at 12:00 PM - 1:15 PM
Spectral gradient methods, such as the recently popularized Muon algorithm, are a promising alternative to standard Euclidean gradient descent for training deep neural networks and transformers, but it is still unclear in which regimes they are expected to perform better. In this talk, I’ll argue that spectral gradient methods perform well because they are less sensitive to a pervasive type of ill-conditioning in deep learning optimization problems. This ill-conditioning is induced by the activation functions and the data. I’ll support this argument with synthetic regression experiments and NanoGPT-scale language model training.
Damek Davis
Associate Professor of Statistics and Data Science
Damek Davis is an Associate Professor in Wharton’s Department of Statistics and Data Science. He was previously an Associate Professor at Cornell ORIE, an NSF Postdoctoral Fellow, and a PhD student in Math at UCLA under Wotao Yin (Alibaba) and Stefano Soatto (AWS AI). He was a long term visitor at the Simon’s Institute in Fall 2017 (bridging discrete and continuous optimization) and Fall 2024 (LLM program). Damek is currently an associate editor at Mathematical Programming and Foundations of Computational Mathematics.