Podcast Banner

Podcasts

Paul, Weiss Waking Up With AI

Evaluation Faking and Group Think

This week on “Paul, Weiss Waking Up With AI,” Katherine Forrest introduces two recent studies that examine the concepts of evaluation faking and Group Think as they pertain to highly capable AI models, and what these studies might mean for future AI development.

Stream here or subscribe on your
preferred podcast app:

Episode Transcript

Katherine Forrest: Hello everyone, and so glad to be with you for today's episode of the “Paul, Weiss Waking Up With AI.” I'm Katherine Forrest, and I am still solo. Anna is off doing the last of her roundtables in Abu Dhabi, and I'm sitting here in my little farmhouse upstate. And I'm really, really enjoying this really short moment of good weather without rain, because we have had so much rain. Everything is very lush, and so while Anna's in an area where it's always sunny in Abu Dhabi, I'm here just enjoying my quick moment of sun.
 
So today, what I am again going to choose to talk about—because as you all know, I get to choose what I want to talk about when Anna's not around—I wanted to talk about some new studies that have come out for highly capable AI models. And these are particularly useful studies for anyone in the AI world, anyone who's interested in AI and anyone who's giving advice, particularly to some model developers or tool developers.
 
So we've got two papers that we're going to talk about today. The first one that I want to talk about is a study that just came out a week ago, actually exactly a week ago today from the taping of this episode, and it's called “Evaluation Faking: Unveiling Observer Effects in Safety Evaluation of AI Frontier Systems.” So that's a mouthful. That's the way these scientific papers are—they've always got a mouthful—but I'm going to explain this to you.