- This event has passed.
Spring 2022 GRASP SFI: Jason Ma, University of Pennsylvania, “Beyond Expected Reward in Offline Reinforcement Learning”
April 6, 2022 at 3:00 PM - 4:00 PM
*This will be a HYBRID Event with in-person attendance in Levine 512 and Virtual attendance via Zoom
Offline reinforcement learning (RL), which uses pre-collected, reusable offline data without further environment interactions, permits sample-efficient, scalable and practical decision-making; however, most of the existing literature (1) focuses on improving algorithms for maximizing the expected cumulative reward, and (2) assumes the reward function to be given. This limits the applicability of offline RL in many realistic settings — for instance, there are often safety or risk constraints that need to be satisfied, and the reward function is often difficult to specify. In this talk, we will explore how we can (1) train a broad class of risk-sensitive agents using purely risk-neutral offline data and provably prevent out-of-distribution extrapolations, and (2) bootstrap offline RL through flexible forms of expert demonstrations, significantly expanding the scope of valid supervision for offline policy learning. With these advances, we aim to bring offline RL closer to real-world applications.
University of Pennsylvania
Jason Ma is a 2nd-year PhD student in the Computer and Information Science department and the GRASP Laboratory at the University of Pennsylvania, where he is jointly advised by Prof. Osbert Bastani and Dinesh Jayaraman. Prior to Penn, Jason received his bachelor’s degree in Computer Science from Harvard University, where he was a winner of the Thomas T. Hoopes prize. His research interests include machine learning, reinforcement learning, and robotics.