Loading Events

« All Events

  • This event has passed.

FOLDS seminar: Theory and practice of LLM quantization

October 2, 2025 at 12:00 PM - 1:00 PM

Zoom link: https://upenn.zoom.us/j/98220304722

 

Modern LLMs process information by repeatedly applying a basic primitive of matrix multiplication. Estimates show that about 60-84% of the energy consumed by LLMs goes into memory load/store operations. How can we reduce this power consumption? Tokens start as about 16-bit integers but get mapped to vectors of floats of length in the 1000s, suggesting very low information density per dimension. Thus, unsurprisingly there has been much success in reducing precision of both weights and activations without much loss in LLM performance. In this talk we will present information-theoretic analysis of quantized representations and show how it lead us to creating NestQuant, a new SOTA algorithm for weight/KV-cache/activations (ICML’2025).

 

Yury Polyanskiy

Cutten Professor of Electrical Engineering and Computer Science, a member of IDSS and LIDS at MIT, and an IEEE Fellow

Yury Polyanskiy is a
Cutten Professor of Electrical Engineering and Computer Science, a member of IDSS and LIDS at MIT, and an
IEEE Fellow.
Yury received Ph.D. degree in electrical engineering from Princeton
University, Princeton, NJ in 2010. His research interests span information theory, machine learning and statistics.
Dr. Polyanskiy won the 2020 IEEE Information Theory Society James Massey Award, 2013 NSF CAREER award and 2011 IEEE Information Theory Society Paper Award. He is a co-author of a recent textbook “Information Theory: From Coding to Learning” (Cambridge University Press, 2025).

Details

Organizers

  • IDEAS Center
  • PennAI
  • Wharton Statistics and Data Science Department

Venue

  • Amy Gutmann Hall, Room 306
  • 3317 Chestnut Street
    Philadelphia, PA 19104 United States
    + Google Map