ASSET Seminar: “Robustness in the Era of LLMs: Jailbreaking Attacks and Defenses”
Raisler Lounge (Room 225), Towne Building 220 South 33rd Street, Philadelphia, PA, United StatesAbstract: Despite efforts to align large language models (LLMs) with human intentions, popular LLMs such as chatGPT, Llama, Claude, and Gemini are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content. For this reason, interest has grown in improving the robustness of LLMs against such attacks. In this talk, we review the current state of […]