- This event has passed.
Fall 2024 GRASP on Robotics: Ruslan Salakhutdinov, Carnegie Mellon University, “Multimodal AI Agents”
October 25 at 10:30 AM - 11:45 AM
This will be a hybrid event with in-person attendance in Wu and Chen and virtual attendance on Zoom.
ABSTRACT
In recent years, the rise of Large Language Models (LLMs) with advanced general capabilities has paved the way towards building language-guided agents that can perform complex, multi-step tasks on behalf of users, much like human assistants. Building agents that can perceive, plan, and act autonomously has long been a central goal of artificial intelligence research. In this talk I will introduce Multimodal AI agents capable of planning, reasoning, and executing actions on the web, that can not only comprehend textual information but also effectively navigate and interact with visual settings I will next present an inference-time search algorithm for agents to explicitly perform exploration and multi-step planning in interactive web environments. Our approach is a form of best-first tree search that operates within the actual environment space, and is complementary with most existing state-of-the-art agents. Finally, I will introduce VisualWebArena, a novel framework for evaluating multimodal autonomous language agents, and offer insights towards building stronger autonomous agents for both digital and physical environments.
Ruslan Salakhutdinov
Carnegie Mellon University
Russ Salakhutdinov earned his PhD in computer science from the University of Toronto, where he was advised by Nobel Laureate Geoffrey Hinton. After spending two post-doctoral years at MIT, he joined the University of Toronto and later moved to CMU. He also served as a director of AI research at Apple. Russ’s primary interests lie in deep learning, machine learning, and generative AI. He is an action editor of the Journal of Machine Learning Research, served on the senior programme committee of several top-tier machine learning conferences including NeurIPS, ICLR, and ICML, was a program co-chair for ICML 2019 and general chair for ICML 2024. He has authored over 250 research papers and his work has received over 200,000 citations according to Google Scholar. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, a recipient of the Early Researcher Award, Google Faculty Award, and Nvidia’s Pioneers of AI award.