Loading Events

« All Events

ASSET Seminar: “Alignment and Control with Representation Engineering”

April 9 at 12:00 PM - 1:15 PM

Abstract:

Large Language Models (LLMs) are vulnerable to adversarial attacks, which bypass common safeguards put in place to prevent these models from generating harmful output. Notably, these attacks can be transferrable to other models—even proprietary ones—potentially compromising a wide range of AI systems with a single exploit. This surprising fragility underscores a critical weakness in current AI safeguards.

In this talk, we illustrate how these attacks are discovered, and several recent advances that take advantage of models’ internal representations to thwart them. Unlike much prior work that relies on adversarial training methods, this approach directly controls neural representations responsible for harmful and unwanted behaviors, while remaining agnostic to particular attacks. Notably, in start contrast with prior work we show that these methods can remain effective while preserving the model’s performance on non-adversarial inputs. Our findings suggest that achieving robust safety in generative models may be an attainable goal.

Zoom Link:https://upenn.zoom.us/j/95869536469

Matt Frederickson

Associate Professor

Matt Fredrikson is an Associate Professor at Carnegie Mellon University’s School of Computer Science, where his research focuses on achieving safety and security objectives in systems that rely on artificial intelligence components. He has worked on methods for uncovering vulnerabilities and privacy breaches in deep learning models, and developing methods to defend them against adversarial threats in real-world deployments. His work on these problems has earned several best paper awards awards at security and privacy conferences, and the USENIX Test of Time Award in 2024. Matt is also the co-founder and CEO of Gray Swan AI, a startup dedicated to providing robust assessments and safeguards for organizations deploying AI in diverse and demanding environments.

Details

Date:
April 9
Time:
12:00 PM - 1:15 PM
Event Tags:
, ,

Venue

Amy Gutmann Hall, Room 414
3333 Chestnut Street
Philadelphia, 19104 United States
+ Google Map