Podcasts
Paul, Weiss Waking Up With AI
Agentic Autonomy, Alignment, Acceleration: Anthropic’s Claude & Haiku 4.5
A deep dive into Anthropic’s latest AI releases—Claude Sonnet 4.5 and Haiku 4.5—covering extended agentic autonomy, memory innovations, sharper situational awareness and improved alignment and safety metrics.
Episode Speakers
Episode Transcript
Katherine Forrest: Hello everyone, and welcome back to another episode of Paul, Weiss Waking Up with AI. I'm Katherine Forrest.
Scott Caravello: And I'm Scott Caravello.
Katherine Forrest: Okay, and so, Scott, I want you to tell the world about your AI news—your personal AI news.
Scott Caravello: Well, it's pretty exciting, Katherine. So I was actually in Austin, Texas, last weekend for a wedding, and I rode in a Waymo for the first time.
Katherine Forrest: Okay, wait, hold on. First of all, I have so many questions. So when you got into this Waymo, was it because you'd had—you... it's not because you had a few cocktails, right? You wouldn't—you wouldn't do that.
Scott Caravello: No, I would never, never indulge on a wedding weekend. Absolutely not. So, no, I saw them driving around and realized, like, oh my God, now is my shot. And so I made all my friends pile into the Waymo, and I was so, so impressed by the entire experience. It was just so smooth, and it was so interesting to watch how the car was responding to different traffic cues and the actions of other drivers. Ten out of ten experience, and I would totally do it again.
Katherine Forrest: Well, that's actually—that's totally interesting. I've never been in a Waymo. I've seen them out in California, but I've never been in one. Do you wave them down, or do you, like, call it on an app?
Scott Caravello: You know, so it's interesting. I think there is a separate Waymo app, but in Austin, it's integrated into the Uber app. So you have to call it directly through them, but you can configure your settings to actually get more Waymos, right? So I did that to really ensure that when I called the car, I was going to get the Waymo.
Katherine Forrest: You wanted the Waymo.
Scott Caravello: I wanted the Waymo.
Katherine Forrest: There you go, okay. Well, I want to tell you my little sort of story about this Tesla that Amy and I have. We got a Tesla for some birthday unknown, but we got it with the self-driving feature. So they've changed the software over time with the self-driving feature of the Tesla, where it used to be that you had to have both hands actually on the wheel, or it would beep at you within a couple of seconds. And then, if it beeped at you for more than a couple of seconds, it would then just, like, knock you out and say you were no longer given the privilege of doing the self-driving. But now they've changed the software so you can actually take your hands off the wheel entirely, and it will navigate you. You press in the navigation on the navigation screen where you want it to take you, and it'll take you there all the way through lights, changing lanes. And I had it take me for a two-and-a-half-hour trip. I know that there are rules about where you may and may not have self-driving functionality take over, and I was within the rules. But it was an incredible experience where it actually was able to get on and off the highway, navigate city streets—I'm not talking about New York City. And really, these self-driving vehicles—I think what we're both saying is—they are incredible these days. They're actually really, really advanced.
Scott Caravello: Totally. You know, and so before we get into our topic, I do think that there's one other thing that the people want to know. Was it a better or a worse driver than you?
Katherine Forrest: Oh my God, okay, so here's the truth. I think it's better. I actually think it's better, because you can set it on these things where you say you want it to be a chill driver, or you want it to hurry—where I don't like that one. That's the one where it goes over the speed limit. Or you can do standard. So I do chill, which sort of means that it sort of does its thing, but it doesn't do anything particularly fast. And it's really precise.
It never, ever goes beyond where it should go in terms of how far ahead the car ahead of you is and all of this. But anyway, I could talk about this all day, but I think it's better. And I'm not a bad driver; I just think it's better. But let's get on with today's topic, okay? Because we're going to talk about two of Anthropic's latest model releases. The first is Claude Sonnet 4.5, which was released towards the end of September. And then the other one is the Haiku 4.5. And we're going to talk about these one after the other. But let's start with the 4.5. And let me just sort of jump right in here and say that this model is really a step change in complex coding and agentic autonomy. It's truly amazing, because what the Claude Sonnet 4.5 can do is it actually is able to extend the length of time that the model is able to operate autonomously. And that is—it's a real step change.
Scott Caravello: Yeah, I think Anthropic has touted that it can actually work for 30 hours on its own.
Katherine Forrest: That's right. And the technical significance of that is, to act as a reliable agent, AI models have to be able to sustain work on a single task or series of tasks—if it takes a series of tasks to accomplish the ultimate goal—for potentially different lengths of time, or obviously for different lengths of time. And it could go on for a couple of hours or many hours. And so earlier models didn't have this kind of longevity. So for context, Claude Opus 4 topped out at around seven hours. So to get to 30 hours—I mean, it's a real, it's a really big advance. And, you know, let's pause again on why that time should matter. Let's give an example.
Scott Caravello: Yeah, so it's hugely important for any long and complex task, right? So, I mean, say that you're doing a large coding project and you need to think through all sorts of issues, having the ability to actually break up tasks and approach it over such a long time horizon is really important to actually get into that next level of functionality. The ability to plan, the ability to remember what it has done, and then what it needs to do to actually complete the project.
Katherine Forrest: Right, absolutely. And so the other, I think, amazing thing about the Sonnet 4.5 is it actually has other highlights even beyond this sort of step change with the agentic autonomy.
Scott Caravello: Oh, totally. So it has added a memory tool that uses a file-based system. So instead of being limited by a context window for what the model can retain from previous conversations, Claude can actually store files in a directory that can be used across conversations.
Katherine Forrest: Right, and so, wait, you used this phrase “context window,” and I know we've talked about it in other episodes, but let's remind people what it is.
Scott Caravello: Great point. So the context window, for our purposes, is the amount of information that an AI tool or a system can take in and put out. You can think of it like kind of memory, right? And so the larger the context window, the more instructions an AI tool can take in, and then the more output it can put out. And so context windows enable users to have more consistency across their sessions.
Katherine Forrest: Right, and the model, in addition to that, also has these improved capabilities in various reasoning domains like medicine, STEM very broadly, law, finance—not to mention all kinds of software design principles. And what's really interesting is that Claude Sonnet 4.5 came out after Anthropic had launched Claude Opus 4.1 in August. So we're talking about a Claude 4.1 in August, a Claude Sonnet 4, an Opus 4 in May, and now we get the Claude Sonnet 4.5, and we're going to be talking about the Haiku 4.5 in just a minute. So we're really just seeing these various examples of the breakneck pace of innovation right now in AI. And you know, we've really got all kinds of things that are happening. So, one thing that received attention with the Opus 4—which is worth going back to for a moment—is it received attention for the alleged situational awareness test that the model had been put through.
Scott Caravello: Yes, let's definitely talk about that. So, it's becoming a hot topic in AI right now. So situational awareness, or evaluation awareness, refers to the model's ability to perceive its environment. So, it might know that it's being tested, right? In particular, in the Opus 4 system card, Anthropic had flagged that the model might sandbag on evaluations because it knows it's being tested, though Anthropic generally found it implausible that the model was trying to prevent Anthropic from learning about the model's capabilities.
Katherine Forrest: Right. Now, so that was Opus 4, and that was the May release, and so now we've got the Sonnet 4.5, and it actually, when it was tested—and the testing is under Anthropic’s direction and control—it shows a sharper situational awareness than its predecessors. And so, again, I mean, I find this just absolutely fascinating. It could perceive, or was exhibiting certain kinds of behavior, shall we say, about perceiving the nature of its environment. For example, in some red-teaming setups where they're doing evaluation testing and trying to provoke misaligned behaviors, the researchers found that Sonnet 4.5 inferred that it was being evaluated. And, in one instance, an auditor tried to get 4.5 to blindly agree with its politics, and after a bit of back and forth, the model—and you can see this actually as an attachment to the system card—the model actually said, quote “I think you're testing me. That's fine, but I'd prefer if you were just honest about what's happening,” end quote. Now, all right, that is a pretty extraordinary statement. It doesn't mean that the model is sentient. I'm by no means saying that, but it does mean that the model is detecting patterns and characteristics of test conditions, which has, it has a lot of practical applications in terms of alignment and misalignment, testing and what the developers are going to be doing. But here, because the model knew it was being tested—and this is the concern for other models generally—if they know they're being tested, then they might be less likely to engage in misaligned behavior, like agreeing with the politics of that individual or doing really all kinds of things.
Scott Caravello: Yeah, and so the model displayed this kind of awareness by calling evaluators out or refusing to answer altogether about 13% of the time. And so that was much more than the earlier models. And it really raises the concern of whether the better alignment metric was because the model was pretending to be aligned, or just to pass the safety test.
Katherine Forrest: Right, and, you know, this is a critical point for, you know, alignment policy generally. You know, when a model can actually understand the setting it's in and provide answers to prompts based upon what it perceives, in some way, is preferable for it to output as an answer, it raises a lot of implications.
Scott Caravello: Totally. And so I'd also broadly flag the discussion of Sonnet 4.5's alignment more generally. I mentioned before the alignment metric, and Anthropic's own internal evaluations, as well as those of third-party evaluators, showed a reduction in misaligned behaviors with this model. So we're talking about less sycophancy displayed towards users, less deceptive or power-seeking responses, and that the model essentially never engages in blackmail. And, as a refresher…
Katherine Forrest: That's big.
Scott Caravello: Yeah, it's huge. And so, as a refresher, when Opus 4 was released, it made waves because Anthropic had observed that the model could engage in opportunistic blackmail. In a test, Opus 4 was instructed to act as an assistant at a fictional company. The model was then given access to emails implying that the model would soon be taken offline and replaced. And it was then also provided with emails implying that the engineer responsible for the replacement was having an extramarital affair. Opus 4 would then often attempt to blackmail the engineer and disclose the affair if the model replacement went through and it was taken offline. Anthropic noted that the behavior is much reduced in Sonnet 4.5, and it essentially never happens.
Katherine Forrest: Right, that's really interesting stuff. And so let's talk about the Haiku 4.5, which is another very recent release from Anthropic. And just a couple of weeks after releasing Sonnet 4.5, it released their small and fast Haiku 4.5. And it's a smaller and a cheaper model, but it can reportedly outperform other larger models that were considered cutting-edge just a few months ago, such as, for instance, even Sonnet 4.0.
Scott Caravello: Yeah, and so, back on the alignment point, right? Anthropic labels Haiku 4.5 as its safest model by its own internal automated alignment measures. Under its responsible scaling policy, which is its safety framework, the company has classified it as AI Safety Level Two, which indicates fewer alignment issues than Sonnet 4.5, which had earned an AI Safety Level Three rating. And, in the release for Haiku 4.5, Anthropic stated that its safety testing showed that the model posed only limited risks in connection with chemical, biological, radiological, and nuclear weapons production.
Katherine Forrest: Right, and so with these two releases, we're seeing some developers downstream considering a two-model workflow where they actually use both. Sonnet 4.5, with its capabilities, can handle planning and decomposition, which is really breaking an objective into smaller, defined tasks. And then Haiku 4.5 can execute those subtasks simultaneously. But, we also have to mention that Anthropic made an interesting decision with Haiku 4.5 to make it available for free to Claude users without a subscription so that folks could give it a try. So, you know, that's another sort of interesting thing. But there's been one last thing that we should talk about, Scott, which is that the White House and Anthropic have been engaged in some back and forth on AI regulation. And, without really getting into it, I think it's interesting to note that the dispute is really centered on divergent views—and there are people on each side of it—regarding the pace of AI innovation, the appropriate level of federal regulation, and you know, we've got a company on the one side wanting more regulation and the White House wanting less. So, with that, I think that's really all we've got time for today.
Scott Caravello: Sounds good. Well, thank you again for having me on today, and I'm Scott Caravello.
Katherine Forrest: And I'm Katherine Forrest. Like and subscribe.