Feature Learning and "the linear representation hypothesis" for monitoring and steering LLMS

Printable PDF

Department of Mathematics,
University of California San Diego

****************************

Math 278B: Mathematics of Information, Data, and Signals

Misha Belkin

UCSD

Feature Learning and "the linear representation hypothesis" for monitoring and steering LLMS

Abstract:

A trained Large Language Model (LLM) contains much of human knowledge. Yet, it is difficult to gauge the extent or accuracy of that knowledge, as LLMs do not always “know what they know” and may even be unintentionally or actively misleading. In this talk I will discuss feature learning introducing Recursive Feature Machines—a powerful method originally designed for extracting relevant features from tabular data. I will demonstrate how this technique enables us to detect and precisely guide LLM behaviors toward almost any desired concept by manipulating a single fixed vector in the LLM activation space.

May 16, 2025

11:00 AM

APM 5829

Research Areas

Mathematics of Information, Data, and Signals

****************************

Department of Mathematics, University of California San Diego

Math 278B: Mathematics of Information, Data, and Signals

Misha Belkin

UCSD

Feature Learning and "the linear representation hypothesis" for monitoring and steering LLMS

Abstract:

May 16, 2025

11:00 AM

APM 5829

Department of Mathematics,
University of California San Diego