Loading Events

« All Events

  • This event has passed.

Spring 2024 GRASP Seminar: Yutong Bai, Johns Hopkins University, “Listening to the Data: Visual Learning from the Bottom Up”

March 8 at 2:00 PM - 3:00 PM

*This seminar will be held in-person in Levine 307 with virtual attendance via Zoom

ABSTRACT

We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. To do this, we define a common format, “visual sentences”, in which we can represent raw images and videos as well as annotated data sources such as semantic segmentations and depth reconstructions without needing any meta-knowledge beyond the pixels. Once this wide variety of visual data (comprising 420 billion tokens) is represented as sequences, the model can be trained to minimize a cross-entropy loss for next token prediction. By training across various scales of model architecture and data diversity, we provide empirical evidence that our models scale effectively. Many different vision tasks can be solved by designing suitable visual prompts at test time.

Yutong Bai

Johns Hopkins University

Yutong Bai is a 5th-year CS PhD student at Johns Hopkins University advised by Prof. Alan Yuille, and currently a visiting student at UC Berkeley advised by Prof. Alyosha Efros. She has interned at Meta AI (FAIR Labs) and Google Brain, and she is selected as a 2023 Apple Scholar and EECS Rising Star.

Details

Date:
March 8
Time:
2:00 PM - 3:00 PM
Event Category:
Website:
https://www.grasp.upenn.edu/events/spring-2024-grasp-seminar-yutong-bai/

Organizer

General Robotics, Automation, Sensing and Perception (GRASP) Lab
Email
grasplab@seas.upenn.edu
View Organizer Website

Venue

Levine 307
3330 Walnut Street
Philadelphia, PA 19104 United States
+ Google Map