Loading Events

« All Events

  • This event has passed.

CIS Seminar: “Pareto-efficient AI systems: Expanding the quality and efficiency frontier of AI”

March 6 at 3:30 PM - 4:30 PM

We have made exciting progress in AI by massive models on massive amounts of data center compute. However, the demands for AI are rapidly expanding. I identify how to maximize performance under any compute constraint, expanding the Pareto frontier of AI capabilities.
 
This talk builds up to an efficient language model architecture that expands the Pareto-frontier between quality and throughput efficiency. In motivation, the Transformer, AI’s current workhorse architecture, is memory hungry, severely limiting its throughput, or amount of text it can process per second. This has led to a Cambrian explosion of alternate efficient architecture candidates proposed across prior work. Prior work has painted an exciting picture: there exists architectures that are asymptotically faster than Transformers, while also matching quality. However, I ask, if we’re using asymptotically faster building blocks, are we giving something up in quality?
  1. In part one, we build understanding. Indeed, there’s no free lunch! I present my work to identify and explain the fundamental quality and efficiency tradeoffs between different classes of architectures. Methods I developed for this analysis are now ubiquitous in the development of language models.
  2. In part two, we measure how AI architecture candidates fare on the tradeoff space. A major hurdle, however, is that we lack implementations of the architectures that that run at peak-efficiency on modern hardware. Further, many proposed architectures are asymptotically fast, but not wall-clock fast. I present ThunderKittens, a new programming library I built to help AI researchers write simple, hardware-efficient algorithms across hardware platforms.
  3. In part three, we expand the Pareto-frontier of the tradeoff space. I present the BASED architecture, which is built from simple, hardware-efficient components. I released the state-of-the-art 8B-405B Transformer-free language models, per standard evaluations, all on an academic budget.
Given the massive investment into language models, this work has had significant impact and adoption in research, open-source, and industry.

Simran Arora

Computer Science Dept., Stanford University

Simran Arora is a PhD student at Stanford University advised by Chris Ré. Her research blends machine learning and systems towards expanding the Pareto frontier between AI quality and efficiency. Her machine learning research has appeared as Oral and Spotlight presentations at NeurIPS, ICML, and ICLR, including an Outstanding Paper award at NeurIPS and Best Paper award at ICML ES-FoMo. Her systems work has appeared at VLDB, SIGMOD, CIDR, and CHI, and her systems artifacts are widely used in open-source and industry. In 2023, Simran created and taught the CS229s Systems for Machine Learning course at Stanford. She has also been supported by a SGF Sequoia Fellowship.

Details

Date:
March 6
Time:
3:30 PM - 4:30 PM
Event Tags:
,
Website:
https://www.cis.upenn.edu/events/

Organizer

Computer and Information Science
Phone
215-898-8560
Email
cherylh@cis.upenn.edu
View Organizer Website

Venue

Wu and Chen Auditorium (Room 101), Levine Hall
3330 Walnut Street
Philadelphia, PA 19104 United States
+ Google Map
View Venue Website