-
ASSET Seminar: “Getting Lost in ML Safety Vibes”
ASSET Seminar: “Getting Lost in ML Safety Vibes”
Abstract: Machine learning applications are increasingly reliant on black-box pretrained models. To ensure safe use of these models, techniques such as unlearning, guardrails, and watermarking have been proposed to curb model behavior and audit usage. Unfortunately, while these post-hoc approaches give positive safety ‘vibes’ when evaluated in isolation, our work shows that existing techniques are quite brittle when deployed […]