- This event has passed.
FOLDS SEMINAR: The Hidden Width of Deep ResNets
November 10 at 2:30 PM - 3:30 PM
Zoom link: https://upenn.zoom.us/j/6130182858
We present a mathematical framework to analyze the training dynamics of deep ResNets that rigorously captures practical architectures (including Transformers) trained from standard random initializations. Our approach combines stochastic approximation of ODEs with propagation-of-chaos arguments to obtain tight convergence rates to the “infinite size” limit of the dynamics. It yields the following insights:
1/ Depth begets width: infinite-depth ResNets of any hidden width behave throughout training as if they were infinitely wide;
2/ Phase diagram: we derive the phase diagram of the training dynamics, which singles out an “ideal” scaling of hyper-parameters (initialization scale and learning-rates), extending “CompleteP” to more general architectures;
3/ Optimal shape scaling: our analysis suggests how to scale depth, hidden width and embedding dimension of a ResNet when scaling up parameter count. With the optimal shape and a parameter budget P, we argue that the model converges to its limiting dynamics at rate P^{-1/6}.
Lénaïc Chizat
Assistant Professor at EPFL (Switzerland)
Lénaïc Chizat is a computational mathematician and an Assistant Professor at EPFL (Switzerland) working at the intersection of deep learning optimization and optimal transport.