PICS Colloquium with David Schwab: “Out-of-distribution generalization in context”
April 17 at 2:00 PM - 3:00 PM
In-context learning (ICL) is an emergent capability of pretrained transformers that allows models to generalize to previously unseen tasks after seeing only a few examples. We investigate empirically the conditions necessary on the pretraining distribution for ICL to emerge and generalize out-of-distribution. We find that as task diversity increases, transformers undergo a transition from a specialized solution, which exhibits ICL only within the pretraining task distribution, to a solution which generalizes out of distribution to the entire task space.
Next we analyze ridge regression under concept shift, a form of distribution shift in which the input-label relationship changes at test time. We derive an exact expression for prediction risk in the thermodynamic limit. Our results reveal a phase transition between weak and strong concept shift regimes and nonmonotonic data dependence of test performance even when double descent is absent. Our theoretical results are in good agreement with experiments based on transformers; under concept shift, too long context length can be detrimental to generalization performance of next token prediction.
David Schwab
Professor of Biology, Physics, and the Initiative for the Theoretical Sciences at the CUNY Graduate Center
David J. Schwab is a Professor of Biology, Physics, and the Initiative for the Theoretical Sciences at the CUNY Graduate Center. Following a PhD in theoretical condensed matter physics and biophysics from UCLA, he did a postdoc in biophysics at Princeton. He also spent time as an Assistant Professor of Physics at Northwestern, a Visiting Research Scientist at Meta’s FAIR labs, and a Research Scientist in Meta’s Reality Labs. He received a Simons Fellowship in the Mathematical Modeling of Living Systems in 2017 and a Sloan Fellowship in Physics in 2020.