- This event has passed.
CIS Seminar: “Towards Flexible, Scalable, and Knowledgeable Generative Intelligence”
February 20 at 3:30 PM - 4:30 PM
From language modeling to 3D vision, generative AI has revolutionized nearly every aspect of machine learning. In this talk, I will examine the limitations of the foundation behind many generative AI techniques–autoregressive models. Despite their impressive successes, these token-by-token models face various challenges, including 1). non-flexible computation during generation, 2). lack of rich inner structures for scalable modeling, and 3). limited understanding of the real world.
To address these three issues, I propose to strategically predict “latents” for the design of new generative models, where latents refer to the model’s intermediate representations during the generation process. First, I will demonstrate how integrating latents allows flexible architecture designs to enhance both efficiency and adaptability ,such as in the first non-autoregressive model for sequence generation. Next, I will show how to use latents to incorporate useful data structures for improved model scalability, especially in high-resolution images and videos. Moreover, I will demonstrate how to use latents to infuse world knowledge such as 3D for tasks like consistent view synthesis. Throughout the talk, I will cover various modalities, including text, images, and 3D. Finally, I will conclude with a discussion about the prevailing challenges and envision future paths that could lead to more flexible, scalable ,a nd knowledgeable next-generation generative models.
Staff Research Scientist, Apple Machine Learning Research
Jiatao Gu is a staff research scientist at Apple Machine Learning Research (MLR). Prior to Apple, Jiatao was a senior research scientist at Facebook AI Research (FAIR). He received his Ph.D. from the University of Hong Kong after earning his Bachelor’s degree from Tsinghua University. He is the recipient of the Hong Kong PhD Fellowship. His research stands at the intersection of machine learning, natural language processing, and computer vision, with a special focus on generative modeling. His papers have received more than 12,000 citations.