- This event has passed.
ASSET Seminar: Thinking fast with Transformers – Algorithmic Reasoning via Shortcuts (Surbhi Goel, University of Pennsylvania)
April 19, 2023 at 12:00 PM - 1:30 PM
PRESENTATION ABSTRACT:
In this new era of deep learning, the emergent algorithmic reasoning capabilities of Transformer models have led to significant advancements in natural language processing, program synthesis, and theorem proving. Despite their widespread success, the underlying reasons for their efficacy and the nature of their internal representations remain elusive. In this talk, we take the lens of learning the dynamics of finite-state machines (automata) as the underlying algorithmic reasoning task and shed light on how shallow, non-recurrent Transformer models emulate these recurrent dynamics. By employing tools from circuit complexity and semigroup theory, we characterize “shortcut” solutions that allow a shallow Transformer to precisely replicate $T$ computational steps of an automaton with only $o(T)$ layers. We show that Transformers are efficiently able to represent these “shortcuts” using their parameter-efficient ability to compute sparse functions and averages. Furthermore, through synthetic experiments, we confirm that standard training successfully discovers these shortcuts. We conclude with highlighting the brittleness of these “shortcuts” in out-of-distribution scenarios.
This talk is based on joint work with Bingbin Liu, Jordan T. Ash, Akshay Krishnamurthy, and Cyril Zhang.
Surbhi Goel
Magerman Term Assistant Professor of Computer and Information Science at University of Pennsylvania
BIO:
Surbhi Goel is the Magerman Term Assistant Professor of Computer and Information Science at University of Pennsylvania. She is also a part of the ASSET Center on Safe, Explainable, and Trustworthy AI Systems, and the Warren Center for Network and Data Sciences.
Dr. Goel’s research interests lie at the intersection of theoretical computer science and machine learning, with a focus on developing theoretical foundations for modern machine learning paradigms, especially deep learning.
Prior to this, she was a postdoctoral researcher at Microsoft Research NYC in the Machine Learning group. She obtained her Ph.D. in the Computer Science department at the University of Texas at Austin advised by Adam Klivans. Surbhi’s dissertation was awarded UTCS’s Bert Kay Dissertation award. Her Ph.D. research was generously supported by the JP Morgan AI Fellowship and several fellowships from UT Austin. During her PhD, she visited IAS for the Theoretical Machine learning program and the Simons Institute for the Theory of Computing at UC Berkeley for the Foundations of Deep Learning program (supported by the Simons-Berkeley Research Fellowship). Before that, she received her Bachelor’s degree from Indian Institute of Technology (IIT) Delhi majoring in Computer Science and Engineering.