
Podcasts
Paul, Weiss Waking Up With AI
OpenAI's Next Moves
In this week’s episode of “Paul, Weiss Waking Up With AI,” Katherine Forrest and Anna Gressel break down a week of major developments for OpenAI, including the company’s entry into open source, its new government contract and the debut of GPT-5.
Episode Speakers
Episode Transcript
Katherine Forrest: Well, good morning, everyone, and welcome to another episode of the “Paul, Weiss Waking Up With AI” podcast. I'm Katherine Forrest.
Anna Gressel: And I'm Anna Gressel.
Katherine Forrest: And here we are, Anna, bright and early, coffee in hand, although I have to say, even through the camera, you look a little tired.
Anna Gressel: I have my coffee, Katherine, in my Waking Up With AI mug today. There's nothing better.
Katherine Forrest: You're branded. All right. All right. And people don't realize that we actually gave out towels this summer that had also the branded “Paul, Weiss Waking Up With AI.”
Anna Gressel: That's true. Our summer associates love them. You weren't there for the giveaway, but they were thrilled, so…
Katherine Forrest: Right, did you save one for me?
Anna Gressel: Yeah, yeah, yeah. You have one on your desk. And I've heard that people have already gone to the beach and used it. So, thumbs up.
Katherine Forrest: All right. There you go, there you go. Well, we have an interesting episode based upon just one week's worth of announcements, and that really speaks to the velocity of change in this space.
Anna Gressel: Yep, it's super exciting.
Katherine Forrest: We'll be talking about some of the OpenAI developments this week, and then we're going to be talking about some other developments with other companies next week. But there are a couple of OpenAI developments that we really need to sort of dive right into because today, when we're taping, it's August 8th. This will be released next week. And already these developments will be a week old. Who knows what's going to happen between now and then, Anna? We could have like embodied humanoid robots running around, you know, talking to us by next week, who knows.
Anna Gressel: I know it's the pace of change. It's what keeps our world really interesting, I think.
Katherine Forrest: Yeah, it really does. So let's just go ahead and get ourselves sort of started. But I wanted to sort of step back and contextualize some of these developments.
Anna Gressel: Yep. Let's get to your kind of biggest front burner point for folks.
Katherine Forrest: Well, the big point, I think, of some of what we're seeing right now—and it's sort of this, you know, when we talk about today, some of the OpenAI developments, and then next week some of the developments from other companies as well—we're really talking about the push and the realization of some of the White House's AI Action Plan, where there is, you know, there's been a lot of talk about what that all means and how it's going to be executed, and we're seeing some of that right now.
Anna Gressel: Yeah, I think that's a really important point, particularly given that the plan prioritizes this dissemination of American AI technology in this race for adoption, which is really useful. And it's really important when other developers begin to rely on models for other AI tools. And so they really are working with an ecosystem. I think folks in the AI space know this. So they can actually work with things like open source AI models and build on top of those models to create bigger AI-based systems and products.
Katherine Forrest: Right. And so, you know, the action plan talked about disseminating the United States AI models, both, of course, domestically and internationally, so that there would be an overall uptake in U.S. developed models and then hopefully leading to an exponential increase in worldwide adoption. And so, part of what we're seeing this week with some of these announcements really go to, sort of, adoption rates.
Anna Gressel: Yeah. And I think one of the things we'll talk about today that's important from that perspective is OpenAI's jump into the open source game, as well as its government contract. And, you know, there's a government contract with a few different big companies right now. But Katherine, you know, for folks who also want to situate a little more deeply on open source, I think it bears mentioning we've done a number of episodes that touch on open source, including around the debate on open source safety. We did a bunch of episodes this past winter on DeepSeek and the debate around Chinese open source and then global adoption of open source. So if you're feeling like you want some more grounding in all of those topics, you can go back and listen to those.
Katherine Forrest: Yeah, you know, actually, it's a good point. We should just sort of remind people that you can get an entire sort of list of all of the episodes that we've done, which is like, I don't know what number it is. It's like 70 or something. And pick and choose some of our prior episodes to listen to, they're all still up. But, one of the things in the action plan—the AI Action Plan, which I'm going to go back to for just a minute—does relate to open source because the concept is exactly what you were talking about, which is that open source can lead to additional model development, and that can then push adoption and dissemination of the technology.
Anna Gressel: All right, do you want to give us an overview of the—what do we want to call it? Like three big headlines with respect to OpenAI.
Katherine Forrest: Yeah, and they each have some sort of standalone importance, and we can take them actually in relative order of when they were publicly being talked about. And the first relates to the open source releases that we're going to talk about, which is OpenAI has now made its first open source releases. And then it signed an agreement with the U.S. government, as you've mentioned there have been some other companies have done that as well, but OpenAI signed its agreement that made a bit of a splash in the news. And Anthropic and Google, by the way, also had similar agreements, but this is a second sort of big event this week for OpenAI. And then to top it off, we had yesterday's release of the most powerful model for OpenAI yet, which is the GPT-5.
Anna Gressel: So let's jump in. I think the open source release is really, really interesting because it's a whole new strategy for OpenAI in terms of pushing adoption of its technology and additional ways that people can use its models to do their own development, including, for example—I think it's worth noting, you know—in sovereign deployments where you might really want to keep an open source model completely on premises. This is some of the benefit of open source is that it doesn't have to be run through an API, and it can actually be run in all different kinds of ways on device in a dedicated, secure environment. So this is really opening up their option space for their customers.
Katherine Forrest: Right, and the OpenAI open source models are worth sort of naming because it moves away from some of the GPT-o5 or, you know, 5o, 4o, o3, all of these. But they also are not names that sort of roll off the tongue. So the first one is gpt-oss-120b. And then the second one is gpt-oss-20b.
Anna Gressel: Yep, exactly. And let's remind our listeners what open source means. It basically means that the code that comprises the model is actually released publicly. So anyone can download it locally, for example. And that's in contrast to closed models like OpenAI's other GPTs, where the code is proprietary and it's kept secret.
Katherine Forrest: Right, right. And so for the first of those two models with that long name, it's GPT, and the “oss” stands for open source, and the 120b stands for the 120 billion-parameter model. And the second one, same thing, GPT, open source, 20 billion parameter model. So that's sort of the unpacking of the names.
Anna Gressel: Yeah, and both of these models are designed for agentic tasks, and they have full chain-of-thought reasoning available, and they can be customized with the company's own data to focus on specific tasks.
Katherine Forrest: Right, like fine-tuning. And so on the OpenAI website—which is actually a really good source of information, they have a lot of their research papers—you can get easy access to the system cards and a bunch of their models, but also some of the research they have on these. They list the test results for these oss-120b and the oss-20b models, and they perform really well. They measured them against the OpenAI o3 and the o4-mini, and the performance was really pretty good.
Anna Gressel: Yep, and it definitely speaks to OpenAI wanting to be in the open source game. I would also say it speaks a little bit to an acceptance of open source in light of how much other countries have been open sourcing things and so less of a pressure against open source and across the board open sourcing AI models.
Katherine Forrest: Yeah, you know what? I had actually predicted that there'd be sort of some more restrictions on open source than opening of open source. Because, you know, when you release the model parameters, you're really releasing the guts of the brain of the AI tool. But instead, what we're seeing now is this open source release, the democratization, the increased and continued democratization of these models that allows for the development.
Anna Gressel: Although, the one thing I'd say on that, Katherine, is we'll talk about GPT-5 next. It's not like that got open source. So these are smaller models. And I think we're seeing kind of a strategy, at least by some developers, of openly releasing their smaller parameter models, maybe not their largest models. And there's probably some thinking and strategy behind that.
Katherine Forrest: Yeah, you're absolutely right. It's not as if everything has now gone open source. And let's just be clear, some of these more highly capable models now have a trillion parameters. And so we know we're talking about very capable models, but it's not everything yet. So let's go on to GPT-5, and let's talk about what GPT-5 is all about because, I don't know about you, but the moment it came out I downloaded the system card and read it and highlighted it and everything else.
Anna Gressel: Oh, I have no doubt of that. Actually, it's quite a nice read for folks. It's not that dense, and I would definitely recommend picking it up if you're interested in how OpenAI went around about the valuation piece. But we'll talk about it.
Katherine Forrest: But I bet you don't print it out. My guess is that you work on it online. Tell me, tell us.
Anna Gressel: I have a soft copy PDF which I have highlighted because I don't even think I have a printer at this apartment, so I definitely did not print it.
Katherine Forrest: I’m a big paper user.
Anna Gressel: Oh I know, I know, which I love. So what is GPT-5? Well, it is, as you might imagine, actually a family of models. There's a mini version, a nano version, a pro version, what they call a “thinking” version, like “thinking,” thinking mini, thinking nano and thinking pro.
Katherine Forrest: Right, I mean, there is a lot going on with these because they all actually do and are oriented towards slightly different things. And it's not just another chatbot. I mean, I think that's really important for people to understand. It's a unified model that really blends the deep reasoning of OpenAI's o-series with speed and usability of the GPT line. So, you know, one of the things that I find fascinating about this GPT-5 release is it's going to be free to all ChatGPT users, even if you're not a paid subscriber. Again, it's pushing that adoption that we're talking about. So it's a highly capable model that really will be available for free with some other bells and whistles for subscribers. But the basic model is there for free.
Anna Gressel: Yeah, and I think one thing that I really quite enjoyed from the demo—which folks can find on YouTube, you can go and find it on OpenAI site—is that they described the o3 model as kind of having a really smart teenager next to you who was great, but kind of annoying sometimes. And then o4 was like having a really smart college student working with you. And now they've described GPT-5 as really having a PhD student or an army of PhD students working with you. And so you can see that they're trying to describe this step change in what it can do. Like it can spin up entire software applications with hundreds of lines of code, which I will admit, I'm not a coder. I have not yet tried that, but at some point, maybe I need to like dip my toes into vibe coding, so…
Katherine Forrest: No, it's really fun, actually, you don't have to be a coder. Okay, I'm not a coder, but I have done things like just to see how this works. You just give it a prompt and you say, “can you write the code for a ping pong game in Python for me?”
Anna Gressel: And have you executed the code? Have you been like, now I have a…
Katherine Forrest: Yes, yes. Well, I mean, it's not hard, it’s a ping pong game. But, so I haven't exactly done like my dream app. I don't even know what my dream app is.
Anna Gressel: It’s time to start thinking of it.
Katherine Forrest: I know, because you can actually have a text-based idea, put it into the query, you know, where you prompt the model and write the code for the following application for me, and then you can cut and paste it and run it and the whole thing.
Anna Gressel: Well, I think the point with GPT-5 is that it’s going to be even faster and even easier. And it can do other things like navigate your calendar, the drafting is really strong. So why don't I pause there? I think there's a lot more to say on what it can do, but let's talk a little bit more on the details.
Katherine Forrest: Yeah, I mean one of the things that it talks about in the system card is what they call this “real-time router.” And that decides whether to answer a particular query quickly, or whether to put it through more of a thinking mode. And you don't just now have to toggle the settings because—and ask it, you know, “will you think hard about this?” which you can still do. But the router will look at the question and determine whether or not the model needs to focus on it in a particular way or not. So it's a very interesting—I would say, not really what we've called in the past—Mixture of Experts, but it's Mixture of Experts-like, which is taking different aspects of a query and figuring out which part of the model will be able to be applied best to the query and sending it in the right direction. So this router is actually accomplishing that, but in a different way. I'm not actually sure and I'm going to dig into it, but in a different way.
Anna Gressel: Yeah, I think I think it's really, really helpful because it essentially lessens the burden on the user, which is always great from a usability perspective. You could tell the model, “think hard about this.” I think that's kind of cute. Like you can give it a prompt to think harder, and that puts it in thinking mode, apparently. And I think pro users, some users can still select thinking mode, but really it's just good at automatically determining how to evaluate the complexity of a request and figure out how much time it needs to reason. And it just means it's adapting on the fly, which is always helpful. Again, if you don't quite know how to prompt the model, it's making prompt architecture or prompt engineering a little less important, and the model can do a little bit more of that for you.
Katherine Forrest: Right, right. And between the router—which determines how much attention or thought or reasoning to give to a particular query—and the different sizes that you've got, you’ve got a lot of optionality as a developer or a use. You know, you've got the GPT-5, as we mentioned, you've got the GPT-5 mini, the GPT-5 nano. And those all do slightly different things and have slightly different cost profiles. And so that, I think, is very appealing to business users who may want to cut down on certain costs when they know that their use cases don't demand the full capabilities of a particular model.
Anna Gressel: Yep. And I think that, you know, we've seen some really interesting points highlighted by OpenAI, in particular improvements in accuracy and reduction in hallucination rates. So, for example, GPT-5's hallucination rate is reported at 4.8% on general prompts, down from 20% in previous modes or previous models, actually. Sorry, I should say that. And then, you know, on health-related benchmarks, the hallucination rate is as low as 1.6%. And these are evaluated by measuring factual correctness of model outputs against curated data sets and expert validated benchmarks. And one thing I think is really interesting, Katherine, actually, just to pause on this, it turns out that they actually did that evaluation not only by looking at model calls to the internet—so when the internet is of grounding on factual knowledge out in the real world—but even hallucination rates on internal knowledge. So that might be super relevant to a company that has a bespoke data set and really wants the model to be accurate as to its own internal knowledge universe. And that could be something like a law firm or a pharmaceutical company that has tons and tons of data that's not really out in the real world. So they're using it to understand data internally.
Katherine Forrest: Right, right. And, you know, it's also interesting, I was just looking back, I got the system card right here.
Anna Gressel: Mm-hmm.
Katherine Forrest: You know, the hallucination rates that you've given are absolutely correct, but it actually almost sounds like there's more hallucinations than there are, because they've gone down so significantly now that, you know, you can look at some of the hallucination rates for things like the number of incorrect claims, the percent of incorrect claims, or on long fact concepts and things like that, and it can get below a percent. So the average hallucination rate, I'm looking on page 12 of the GPT-5 thinking model, is only 1.1%. I mean, that is really extraordinary. So each of the models has its own, each of these, the part of the family has its own hallucination rate. But what we're talking about now is we've gone from late 2022-2023 when people talked about hallucinations all the time as one of the biggest impediments to AI adoption, now down to a rate that's got to be—I mean, I don't really know, but it's got to be—below human error in terms of just misreading. I mean, you think about people doing reading comprehension tests and humans read certain facts and you then test them on comprehension and we just get it wrong, right? Sometimes for no good reason. So I would think that the hallucination rate or the error rate is now lower than a lot of people.
Anna Gressel: I think that is entirely possible, and it's still an area for organizations to really look at in practice when this is working on their own data set. So it's a reminder as well to remember there's performance on the benchmarks, and then there's performance on your data. So just always something to think about as you're kind of internally reasoning through what your governance should look like for these models.
Katherine Forrest: Right. And so let's go on to, I think, something else that's really important for our listeners, which is some of the safety aspects that are talked about in the model card. And I'm going to turn that over to you. But the first thing I want to say before we get there, and it does have safety implications, is the sycophancy. And that is the way in which a model can try to please the user and sometimes end up having model behavior, either through giving a particular answer or acting in a particular way that can be less than ultimately desirable. So that sycophancy rate has now gone down. They've worked on that, they've worked on it hard, and it's actually reduced by a lot. And you can get those numbers in the system card. But go on to safety because there's some really interesting safety aspects of this.
Anna Gressel: Yeah, definitely. And I think it's worth just quickly pausing on sycophancy. That's just, it's a super interesting area, as you mentioned. They actually talk about the fact, in the system card, that they rolled back some of the prior versions, for example, GPT-4, and adjusted the system prompt. But there is a new method that they used here to reduce that, which was actually kind of doing some post-training around that, around sycophancy. They do flag that they're actively researching other related areas, so situations that may involve emotional dependency or emotional or mental distress. And so this is just an area, I think, to watch, particularly if you're in the safety space and you're thinking about, you know, your own techniques. We have folks here who are on the technical side and not the legal side and are working on, you know, their own kind of red teaming and evaluation techniques. So this is an area I think we're all looking at closely going forward. A few other things that are worth mentioning: One is, I think, quite interesting that they actually introduced what they're calling safe completions, which is, OpenAI calls it, I'm going to quote, “a safety training approach that centers on the safety of the assistant's output,” (that means GPT-5's output) “rather than a binary classification of the user's intent.” And what does this mean in practice? It basically means that…
Katherine Forrest: Or in English.
Anna Gressel: Or in English.
Katherine Forrest: Let's just do practice and in English. Whenever you use the word binary, we have to like break that down.
Anna Gressel: We'll break it down. But it basically means that instead of rejecting an answer to a user's prompt saying, “I'm not going to answer that question,” if it perceives the user to be asking something that might be dangerous, it's going to now more often answer it, but answer it in a way that could not be, that isn't malicious or could not be misused. So it's going to give a safe answer. That's the concept behind safe completions. And the reason they think this is so helpful is, I think in some ways, they think it is just better at reducing the rates of harmful outputs.
And you can see that in how they do some of the testing later on in the system card. It's worth reading around biological weapons risk and other risks. They're seeing the safe completions, how with the overall rates of risk kind of on different interesting domains. Another one is deceptiveness, which, you know, if you guys listen to the podcast, Katherine and I have done episodes on emergent behavior, including deceptiveness by models. And sometimes that can be monitored through the chain-of-thought reasoning that the model has. That's that little scratch pad on the side that we've talked about before. And actually, very interestingly, OpenAI says that they've reduced the deceptiveness rates, including by training the model, and I'm going to quote here again, “to fail gracefully when posed with tasks that it cannot solve.” And that's super interesting, right? Because if you guys remember some of the safety research around deceptiveness, sometimes the model really wants to achieve its task, and it will even resort to things like deceptive behavior to do that to achieve its objectives. And part of what I'm interpreting this to mean, I could be wrong, is that by allowing the model to fail gracefully, it's almost like kind of releasing a pressure valve. And so it doesn't need to be as deceptive because it can fail. And that's just an interesting construct overall. And we'll see how that plays out in additional safety testing. So those are just a few of the safety points I think are worth like putting a pin in. Katherine, you may have others that are worth mentioning.
Katherine Forrest: I would just say one more, which is that you can see on page four of the system card that they've decided OpenAI has made a decision to treat GPT-5 thinking, that particular version of the model, as high-capability in the biological and chemical domain under their preparedness framework, which you can actually download. And they've got a hyperlink to it in the system card so you can sort of see what it is. And they actually say that they don't have evidence yet that the model could meaningfully help a novice create severe biological harm, but they think that there are certain benefits to triggering that preparedness framework. So the preparedness framework outlines what they do when that gets triggered. But, you know, there are, moving on to other things that this GPT-5 does, you know, with the agentic capabilities, they've got, you know, new abilities to work with your Gmail and your Google calendar. And, you know, it's got an ability to take actions, to generate different kinds of text, multi-step plans. It can use tools, it can exercise sequences of actions. I mean, it can do, this model can do a lot. I haven't had an opportunity to really put it through its paces, but I'm looking forward to it.
Anna Gressel: Definitely. I mean, I will say I think we're going to see a lot of interest in one of the capabilities, and Katherine, you may want to talk about this more, but about, you know, user ability to use the model to parse health information and understand health concerns. And I will say as someone, I'm sure many people have had this experience of wading out of their depth in the medical arena, even as—and I will say this is true for me—even as someone who has a lot of like lay person scientific and medical knowledge, sometimes you reach the limits of your own understanding. And so it's super interesting to think about like empowering people to better understand their own health and their healthcare. So, you know, that's something they really flagged very heavily in the demo.
Katherine Forrest: Yeah, yeah, I mean, it's always great to and important to consult a professional healthcare provider, but, you know, thinking about translation, you know, this model, as many of the GPT models and other models, high capability models all over right now, they can translate into a variety of different languages. And sometimes certain domains can have their own specific language, and so you can feed in technical computer stuff. And it will tell you in layperson's language what it means, and you can feed in medical records and it can also give you a sense of what it means. But again, it's always, always useful to consult a health professional. I feel like I have to say that because it advertises that as a real capability and it's impressive. But anyway, you know, if we zoom out a little bit and we look at what's happening globally with the release of GPT-5, we really see the United States leading by the number of AI models still, you know, with China as number two. So GPT-5 is a, it's a big deal, but it's part of, I think, going back to the AI Action Plan, the fulfillment of trying to disseminate high capability U.S. technology globally.
Anna Gressel: Yeah. And I mean, of course, it bears mentioning that we've seen some really notable releases from China, including DeepSeek, Alibaba's Qwen model. I mean, the kind of Asian model landscape is pretty much exploding right now. So it probably means, Katherine, that at some point we should do an episode focused on that. But it's, you know, quite interesting to see those performance gains and again there's the real open source strategy there too.
Katherine Forrest: Right, right. So just to summarize, we've had a big week with a series now of open source releases by OpenAI. So they've gotten into that game. Their government contract, and we're seeing different models now work with the U.S. government, which has got a large user base. So it's useful for everybody to try to get some traction there, and then of course GPT-5 being a big one. So we'll be following these developments, but I really want to talk next week about Genie 3. Have you heard about Genie 3?
Anna Gressel: Yeah, happy to talk about it. Well, there's so much on our list. I feel like every week our list of potential topics just gets longer and longer and longer and longer.
Katherine Forrest: I know. It gets longer and longer and longer. I know. And I'm trying to finish this book, and every time I think I have...you know, I should just do a series of articles. A book is sort of...I don't even know why I did a book because the problem is that when you do a book it takes so long. But a series of articles, you might have some hope of getting it out before it's been overtaken by events. But here I am and I was writing the chapter, or rewriting the chapter, on capabilities of models and then GPT-5 comes out. And so it's a whole thing. And now I’ve got to go back and redo it. Anyway, but I want to talk about Genie 3. We'll talk about that next week or sometime soon. In the meantime, thanks for joining us, and I'm Katherine Forrest.
Anna Gressel: I'm Anna Gressel. Like and subscribe, and send us some questions and feedback.