
Podcasts
Paul, Weiss Waking Up With AI
The Embodied AI Trifecta
On this week’s episode of “Paul, Weiss Waking Up With AI,” Katherine Forrest examines how AI generalization, world models and physical embodiment converge, highlighting 2025 advances in humanoid robotics and what they mean for AGI, common sense and real-world autonomy.
Episode Speakers
Episode Transcript
Katherine Forrest: Hello, everyone, and welcome to today’s episode of “Paul, Weiss Waking Up With AI.” I’m Katherine Forrest, and I am here with you from sunny New York. And I am about to go to an incredibly rainy place in the South to hole myself up in a hotel room with the exclusive purpose of finishing this book where if I don’t press send pretty soon, I don’t know what happens, but it’s not good things. Anyway, we talked a little bit about this last week when I was discussing superintelligence and AI, and I was discussing how a superintelligent AI might actually require humans to engage in sort of a negotiation or renegotiation of our social contract. But I want to talk to you about another set of thoughts that come out of the same book with which I’m currently obsessed because I get up every morning at 5 a.m. and I work on it and then I do my regular day job and then I come back and then I work on it and all of that. And I’ll remind you that the book is called “Of Another Mind” so that when it comes out you can look for it on Amazon and all the places that you get books. And then I’m co-authoring it with Amy Zimmerman.
But in this obsession, okay, I have this sort of trifecta that I want to talk about. And I want to talk about it because when I was putting some pieces of information into the book, I ran across some things that are just actually extraordinary relating to humanoid robotics, which we’ve talked about in previous episodes or at least one previous episode. So I want to go back to that for a second. And I want to update everybody on a little bit about what’s happening and about this trifecta of issues that comes together with AI embodiment. So it combines three concepts, the first is the ability of AI to obtain intelligence and to actually learn to generalize—and we’ll talk about that in a moment—intelligent pieces of data to a broader and more widespread area. So first is the ability of AI to obtain intelligence, and second is something that we’ve also talked about before, which is the second part of this trifecta, world models. And that actually is considered to be a way in which a “mind,” a human mind, for instance, or an AI mind, actually organizes thoughts or concepts about the environment of the world that they’re in, whether it be a narrow world or a broader world. So that’s the second part, which is organizing information into a world model. Third part is AI embodiment, which is the physical manifestation of an AI into a world, into an actual world, to our world, for instance. So let’s put these three things together, which is obtaining intelligence, a world model that’s organizing that intelligence and then a physical embodiment, and we’ll then describe how they come together. So let’s baseline everybody about why this trifecta of things sort of comes together at all. But there are a number of AI thought leaders and engineers, and some of those are actually the same, who believe that embodiment and world models are necessary for AI to actually achieve artificial general intelligence, let alone superintelligence. You know, Yann LeCun from Meta is famously one of those. And of course, he is a brilliant sort of force of nature. But let’s dig into this trifecta of concepts because I think it’s a little bit complicated. And I think that we’ll find, I know we’ll find, based upon the research that I’ve done for this book, that we’re already getting to certain places people never thought we would reach in terms of world models and obtaining knowledge without embodiment.
So first we’re going to talk about the acquisition of knowledge and how it comes into play here. And the idea is that one way that humans learn about the world is from our experiences within it. That our senses, the human senses, help take in information that gives us an intuitive sense of, say, physics: that things that go up come down or that eggs break when dropped, that balls at the top of an incline will roll downwards and you can topple off a cliff, that kind of thing. And the idea is that all of this information about our human movements about the world gets embedded within us and forms a basic sense of common sense about how the world works. And that allows us to generalize from one thing that we’ve experienced to other things that we haven’t directly experienced, but that we can analogize to the thing that we have experienced. So it’s a concept of what’s called an AI generalization, and it’s a key concept. And it’s actually, the artificial general intelligence, it’s sort of the “general” part of that. Because generalization is a kind of cognitive ability that allows an AI model to take—again, just to repeat—knowledge about one thing, and then the AI model, just like a human would, applies it to something entirely different, and it is able to broadly use knowledge. And this concept of generalization is this key aspect in AGI. People think you need to have generalization in order to actually have AGI. So the question is, if you’ve got an AI model and its existence, so to speak, is within a server, right? It’s actually located physically inside a server. The question is, does it have a sufficient basis of knowledge about the actual external world to form this ability to generalize and to have the kind of common sense about how the world works? For instance, that a ceramic plate, when it falls on the ground, is likely to break. Does it have this ability to generalize that would enable it to reach the highest levels of intelligence?
So now this brings us to world models, that second part of the trifecta. And we’ve talked about this before in other podcasts, but I want to bring it now into this sort of three-part thing that I’m doing right now. And on the previous episode, when I talked about world models, I talked about them as a model of the world that, for instance, a human will develop about how to navigate an environment. And that our embodiment in a human form—and so now you see where we’re going to get to embodiment in a minute—helps us to understand the outside world and what it means, for instance, to walk down the street, to look up at a sky, that we gain information and knowledge by being able to do those things. And we actually develop, through information that comes in through our senses and things that we see and things that we sense, we develop a knowledge of the world that is distinct and that is important to our overall ability to think about other worlds. So an important question for us is whether AI models can form world models without or in the absence of physical embodiment. It’s actually a really fascinating question because there’s research that says that they do. And there are people that say, “well, you can’t, because if you don’t actually experience the world, you’re never going to understand it.” But there’s understanding, and then there’s understanding, right? And before we get to AI models, let’s just sort of think about the different ways that humans, for instance, can form views about the world and how different humans’ views of their world can be. So first of all, we’re not all—and now we’re talking about humans—all part of the same local environment, let alone larger environment beyond the local environment, let alone having the same concept of a global environment. We actually differ in how we think about the world. And we don’t all live in the same kinds of buildings. We don’t all navigate the same kinds of streets. We don’t all precisely share similar ways of navigating in our world, even what we think it’s appropriate to do or not appropriate to do. And we have various ways of using our senses that are culturally specific. We have different kinds of physical mobility. There are some people who have a certain kind of physical mobility and other people who have a different kind of physical mobility. And we also, as humans, have large varieties of cognitive awareness, right? We vary a lot. So there’s huge variability in the human experience, and we have to keep that in mind as we turn to this next point.
So now we turn to AI and, you know, we have the situation where we ask, “where would AI get a world model if it doesn’t have physical embodiment? Can it do that if it doesn’t actually navigate itself outside of a server that’s sitting in a lab someplace?” Well, here’s how it would do that, if it were going to do that. Number one, it’s ingested trillions of tokens. The largest world models today have ingested trillions of tokens, and it’s actually taken in basically all the digital information that we’ve been able to feed to it. And that is all kinds of information about the world. And it takes that information and it creates a relationship between the different parts of that information. And the theory is—which now I’m going to tell you and sort of jump ahead and just give you a little preview—there are studies which say it forms a world model actually just by being in the server and by actually taking the information that’s fed to it about the world and organizing it. So it’s a little bit analogous to, for instance, if you’re in a landlocked place and you’ve never been to the ocean, but you read about the ocean and you see books about the ocean and you watch a bit of a movie about the ocean. You know that there’s an ocean and you’ll know that there’s tides, perhaps, and you might even know that there’s riptide currents. And so you’ve got that information about the way the world works even though you’ve never experienced it firsthand. So it’s sort of like that. If the AI model has been trained on all of the scientific works that we’ve been able to put into the model, and that information has informed it about the nature of the physical environment, about how things work mechanically, and also has a world model about social interactions, you can imagine that the AI could form a sort of world model. It’s not going to be just like ours. It’s not going to be like a human world model where it’s experienced it. But now also imagine that with AI, unlike with a human, the AI, to the extent it’s got a sufficient memory capacity—and there are lots of constraints on memory capacity—but let’s imagine for a moment it’s got sufficient memory capacity, it can actually access the concepts that it’s learned long term. It doesn’t have that same kind of forgetfulness, for instance, that a human might have. So I’m not saying it’s memorized things. I’m saying that it’s actually taken in concepts and it’s actually worked with those concepts and that those concepts exist within the vector space inside the neural network. So that is more along the lines of what AI transformer models do today.
And so, there have been these studies, which I’ve already alluded to, that have found that AI models today display a kind of emergent world model within themselves. And that’s fascinating, right? Because we actually didn’t necessarily think that AI models today—which have, we’re putting aside now the physical embodiment because we’re about to get to that—but the AI that’s within a server could actually have a world model. But there have been tests to see whether or not, for instance, an AI model that has learned about planetary orbits has an inductive bias towards Newtonian physics, because learning about planetary orbits doesn’t necessarily teach it the rules, if you will, of Newtonian physics. And it’s been found that it does. It actually has learned something about the world when it’s learned about, for instance, planetary orbits, about physics and Newtonian physics in particular. And it’s able to use that to generalize to other areas. And then there are other theories of world models about language being representational. And then there are other theories about world models being just about the information that the model has taken in, has described the world in sufficient detail from a sufficient number of diverse angles, that the model will actually take in a world model.
So now let’s turn to embodiment, because this is the last part of our trifecta here. And at the very least, there’s a question about whether or not embodiment is necessary for a world model. And I’ve just told you that there are studies that suggest that it’s not, which is actually really fascinating. And that on the other hand, there is still the concept—even though a model can learn a world model just from perhaps how it’s taken in information about the world—but there’s still a concept that embodiment can enhance the world model of an AI system and make it more sophisticated. So what kind of robotic embodiments are we seeing for AI models and AI systems today? Because it’s actually really fascinating to see, and to jump towards the third quarter here we are in 2025, and we’re seeing a couple of different kinds. We’re seeing stationary embodiments that have AI in them, which are sort of, you know, let’s just think about it as being bolted to a table or bolted to a wall. And those are things that, for instance, are in laboratories. But we’re also seeing wheeled and tracked embodiments, which are like a vehicle of some sort. It can be small, it can be large, and let’s then add in a kind of flying version of that, like a little drone, all of which now we’ll imagine have AI in them, because you can have robotic embodiments that have no AI at all, but we’re talking about embodiments that have AI. And those embodiments—the tracked, the flying and the wheeled—are designed to cover all kinds of terrain and they can have a bunch of different applications. They can have disaster recovery applications doing things autonomously. They can have military applications. They can have information-gathering applications all over, including on Mars. And then there’s the quadruped robotics that sort of look like animals. They can look like dogs or just little four-legged quadrupeds. And all of these robotics, again, have the AI built into them. And then we have the last kind, which is the one that science fictionists spend so much time talking about, which is humanoid robotics. And humanoid robotics in 2025 have progressed enormously. And if you have not spent some time looking at some of the YouTube videos on the most recent humanoid robotics or looking at some of the websites for certain companies that make humanoid robotics or even quadruped robotics, you really will find it interesting because there have been extraordinary, extraordinary developments in the dexterity of these robotics, as well as now the cognitive abilities of the LLMs that are then put into the robotics. So we’ve now got these robotics that are able to, with a combination of their dexterity and the LLMs within them, maneuver around an environment and to do it in an autonomous way and even sometimes in an agentic way and take in additional information, which it can then process, keep permanently within its neural network, then use it to create additional concepts with all kinds of other information that it already had in its neural network. And this is happening right now.
So I’m going to recommend two places to go just to sort of get a sense of this. There are many, many companies right now working on very sophisticated robotics, so, you know, look broadly, look widely at things, but I’m going to mention two. One is called Unitree, U-N-I-T-R-E-E, and it’s a Chinese company that’s selling a variety of robotic types, both for industrial use and actually they sell them for domestic use, but I’m not sure how many people are really buying them these days just yet. But what’s really impressive about these Unitree robotics is that they’re incredibly resilient and they’re incredibly dexterous. And so it used to be that robotics would fall down and you’d see videos of them running across the street and tripping and falling down. Now they have this one that’s sort of like anti-gravity where it falls down and immediately, like immediately, jumps back up again and it’s back on its feet. And it can be whacked and hit and, you know, knocked around, and it can withstand all of that and stay standing. And so they also have this dexterity with manual tasks. They can pick things up, they can do all kinds of household tasks, something like 80 to 100 and something or other. They can cook meals and wash floors and fold laundry and do all kinds of things that actually require them to think because, for instance, think about cooking. You open up the refrigerator, you have to pick out the ingredients. The eggs might be in a different place every day unless you’re like me, in which case the eggs are always in the same place and you have all your nice, neat cans of Diet Coke lined up, you know. But imagine a different refrigerator, not my refrigerator, somebody else’s refrigerator where things are all over the place. And they’ve got to pick out the right ingredients, they’ve got to be able to wash the ingredients, they’ve got to be able to actually crack the eggs, they’ve got to be able to tell what temperature on the stove is the appropriate temperature and determine whether or not things are cooking right. So the Unitree videos that you can see about their progress are really impressive. I recommend people taking a look at them. And then there’s USA-built Tesla Gen 3 robot, which is Optimus. It’s one of the Optimus robots, and it’s got the Grok 5 LLM in it. So it’s got a highly capable reasoning model inside of it, and they’re being manufactured now virtually at scale. You know, there’s lots and lots and lots of these things coming off the line. And they also exhibit these incredibly dexterous movements and this resilience to being pushed down and pushed around and being able to get back up again.
So now we’ve got robotics with dexterity that can literally be anywhere and everywhere, learning about the world, with highly capable LLMs embedded and inside of them already. So this is happening. They’re not actually everywhere already, but you see them now being experimented—well, in the streets of New York you can see them now and again when they’re being demoed. But we’re going to now be seeing them all over the place in more and more places. And so we’ve got this infrastructure that’s being built into these AI models. And it’s absolutely clear that between the need for AI-embodied models to navigate the world, they’re going to have some form of world model that’s going into the LLMs. The LLMs are getting, actually, updates through wireless pushes of updates. And the physical embodiments are going to be as smart as LLMs or even smarter because they’re going to be learning as they go. And so we’re going to have a different and additional kind, I think, of intelligence that will come with the embodiment that will then add to the existing world model in the LLM that will then, of course, add to the kind of knowledge that the LLM actually has. So the trifecta comes together. So we already have extraordinarily capable models in servers with world models. And now we’re going to, with the embodied AI, given how it’s progressing and how fast it’s progressing in 2025, we’re actually going to have the ability to push it to a new level. So watch out for advancements in this area. It’s going to be such a huge thing, and I’m really watching this space closely. All right, that’s all we’ve got time for today. I’m Katherine Forrest. Thanks for joining us and listen, if you like this podcast, please do tell your friends, spread the word, get people to sign up and become subscribers, and we’ll talk to you next week.