Loading Events

« All Events

FOLDS seminar: Transformers Meet In-Context Learning: A Universal Approximation Theory

February 19 at 12:00 PM - 1:00 PM

Zoom link: https://upenn.zoom.us/j/98220304722

 

Large language models are capable of in-context learning, the ability to perform new tasks at test time using a handful of input-output examples, without parameter updates. We develop a universal approximation theory to elucidate how transformers enable in-context learning. For a general class of functions (each representing a distinct task), we demonstrate how to construct a transformer that, without any further weight updates, can predict based on a few noisy in-context examples with vanishingly small risk. Unlike prior work that frames transformers as approximators of optimization algorithms (e.g., gradient descent) for statistical learning tasks, we integrate Barron’s universal function approximation theory with the algorithm approximator viewpoint. Our approach yields approximation guarantees that are not constrained by the effectiveness of the optimization algorithms being mimicked, extending far beyond convex problems like linear regression. The key is to show that (i) any target function can be nearly linearly represented, with small ℓ1-norm, over a set of universal features, and (ii) a transformer can be constructed to find the linear representation — akin to solving Lasso — at test time.

Yuxin Chen

Professor of statistics and data science and of electrical and systems engineering at the University of Pennsylvania

Yuxin Chen is currently a professor of statistics and data science and of electrical and systems engineering at the University of Pennsylvania. Before joining UPenn, he was an assistant professor of electrical and computer engineering at Princeton University. He completed his Ph.D. in Electrical Engineering at Stanford University and was also a postdoc scholar at Stanford Statistics. His current research interests include machine learning theory, high-dimensional statistics, and optimization. He has received the Alfred P. Sloan Research Fellowship, the SIAM Activity Group on Imaging Science Best Paper Prize, the ICCM Best Paper Award (gold medal), and was selected as a finalist for the Best Paper Prize for Young Researchers in Continuous Optimization. He has also received the Princeton Graduate Mentoring Award.

Details

Organizers

  • IDEAS Center
  • Penn AI
  • Wharton Statistics and Data Science Department

Venue

  • Amy Gutmann Hall, Room 414
  • 3333 Chestnut Street
    Philadelphia, 19104 United States
    + Google Map