ASSET Seminar: “Getting Lost in ML Safety Vibes”
Amy Gutmann Hall, Room 414 3333 Chestnut Street, Philadelphia, United StatesAbstract: Machine learning applications are increasingly reliant on black-box pretrained models. To ensure safe use of these models, techniques such as unlearning, guardrails, and watermarking have been proposed to curb model behavior and audit usage. Unfortunately, while these post-hoc approaches give positive safety ‘vibes’ when evaluated in isolation, our work shows that existing techniques are quite brittle when deployed […]