ASSET Seminar: “Alignment and Control with Representation Engineering”

Amy Gutmann Hall, Room 414 3333 Chestnut Street, Philadelphia

Abstract: Large Language Models (LLMs) are vulnerable to adversarial attacks, which bypass common safeguards put in place to prevent these models from generating harmful output. Notably, these attacks can be […]