Loading Events

ASSET Seminar: “From kernel machines to the linear representation hypothesis for monitoring and steering LLMs”

April 15 at 12:00 PM - 1:15 PM
Details
Date: April 15, 2026
Time: 12:00 PM - 1:15 PM
Event Category: Seminar
Event Tags:
  • Tags:, , ,
  • Organizer
    AI-enabled Systems: Safe, Explainable, and Trustworthy (ASSET) Center
    Venue
    Amy Gutmann Hall, Room 414 3333 Chestnut Street
    Philadelphia
    19104
    Google Map

    A trained Large Language Model (LLM) contains much of human knowledge. Yet, it is difficult to gauge the extent or accuracy of that knowledge, as LLMs do not always “know what they know” and may even be unintentionally or actively misleading. In this talk I will discuss feature learning introducing Recursive Feature Machines — a powerful generalization of the classical kernel methods designed for extracting relevant features from tabular data. I will demonstrate how this technique enables us to detect and precisely guide LLM behaviors toward almost any desired concept by manipulating a fixed vector in the LLM activation space. I will also discuss how the same method allows for probing for whether LLM exhibits motivated reasoning.

     

    Seminar Recording