Loading Events

« All Events

  • This event has passed.

ASSET Seminar: “When do spectral gradient updates help in deep learning?”

February 4 at 12:00 PM - 1:15 PM

Spectral gradient methods, such as the recently popularized Muon algorithm, are a promising alternative to standard Euclidean gradient descent for training deep neural networks and transformers, but it is still unclear in which regimes they are expected to perform better. In this talk, I’ll argue that spectral gradient methods perform well because they are less sensitive to a pervasive type of ill-conditioning in deep learning optimization problems. This ill-conditioning is induced by the activation functions and the data. I’ll support this argument with synthetic regression experiments and NanoGPT-scale language model training.

 

 

 

Damek Davis

Associate Professor of Statistics and Data Science

Damek Davis is an Associate Professor in Wharton’s Department of Statistics and Data Science. He was previously an Associate Professor at Cornell ORIE, an NSF Postdoctoral Fellow, and a PhD student in Math at UCLA under Wotao Yin (Alibaba) and Stefano Soatto (AWS AI). He was a long term visitor at the Simon’s Institute in Fall 2017 (bridging discrete and continuous optimization) and Fall 2024 (LLM program). Damek is currently an associate editor at Mathematical Programming and Foundations of Computational Mathematics.

 

Details

Organizer

  • AI-enabled Systems: Safe, Explainable, and Trustworthy (ASSET) Center
  • Email asset-info@seas.upenn.edu
  • View Organizer Website

Venue

  • Amy Gutmann Hall, Room 414
  • 3333 Chestnut Street
    Philadelphia, 19104 United States
    + Google Map