Podcasts
Paul, Weiss Waking Up With AI
Memory: Market Rates and Model Weights
In this episode Katherine Forrest and Scott Caravello take us down “memory lane” to explain the importance of high bandwidth memory (HBM) and RAM to AI development. Our hosts also give us a rundown of potential challenges ahead, unpacking developments in the market for memory, including plans for additional capacity and lobster-style RAM pricing.
Episode Speakers
Episode Transcript
Katherine Forrest: Good morning, everyone, and welcome to what is the 100th episode of Paul, Weiss, Waking Up with AI. I'm Katherine Forrest.
Scott Caravello: And I'm Scott Caravello. Katherine, can you believe it?
Katherine Forrest: One hundred, and Scott, did you dress up for— I think I see you in a tie. Like, nobody wears a tie.
Scott Caravello: Exactly, exactly. I–I had to do something special. I'm happy to be a part of this.
Katherine Forrest: And by the way, you know, after 100 episodes—and I think this is the first episode for which you've actually worn a tie—I'm not sure that I'm seeing the coffee cups that are like the symbol of this whole beautiful thing, Waking Up with AI. Like, I've got my coffee cup in my hand.
Scott Caravello: I have to admit, I'm a bit embarrassed. I don't have any mugs in the office. I think after this, I'll come by to get one of the Waking Up with AI mugs, and I will have you sign it to commemorate that this is the 100th episode.
Katherine Forrest: Right. Right? Okay, I'll sign it. And then it'll probably just wipe right off. But, uh, in the meantime, I'm also impressed that not only did you come here for the 100th episode in a shirt and tie, but that you wore a shirt and tie when there's like, I don't know, 14, 18—however many gazillion of inches there are outside—of snow on the ground..
Scott Caravello: I know, I know. Fortunately, I made it through unscathed on the subway.
Katherine Forrest: Yeah, it's a—yeah, it's a—yeah. You know what I do, actually, in the wintertime when it's like this is I just resort to the tried-and-true meal that everyone should have. There's actually two tried-and-true meals. One is, of course, chili. And so we made a white chili this weekend that was phenomenal. It's with chicken, but then also lasagna. And, so, So I—yeah, things are a little tighter than they were before last week. But anyway it's time to move on to our tech now. What is our special 100th episode tech talk today, Scott?
Scott Caravello: Well, it is something that has gotten a lot of buzz lately, and that is memory, which I don't know if we've talked about on the podcast before, but is a really important part of the AI tech stack and the hardware side specifically. And so we'll talk a little bit about those chips and how they play a role in the compute pillar and what is happening around memory at the moment.
Katherine Forrest: Yeah, you gotta love the phrase “compute pillar.” There are so many lovely phrases that only exist in the tech world—the compute pillar. But anyway, we have talked about chips in the past, but we haven't really talked about memory in a dedicated way. So let's just sort of dive right into it, or jump right into it, or walk our way into it. So we, I think, are all familiar with the concept of compute. And that's, you know, really a combination of things that has to do with computational power, the hardware, the chips, all of the things that make the processing occur for—here in our world—the AI tools and the AI models. But we've been getting a lot of interesting information recently on how memory is dealt with in AI.
Scott Caravello: Right, and so that's because to run a model, you have to actually be able to store the model, including its weights. And so that's the role of the memory chips—at least as we're going to talk about it today. And so those storage demands, along with other important functions that are performed by those chips, are driving a big challenge in the AI supply chain, which is shaping up to be an important topic this year, and, you know, maybe stretching into next year. And so it's also, therefore, probably going to play a very important role in shaping the AI investment landscape this year.
Katherine Forrest: Exactly, exactly. And, so, just to sort of pause for a moment and sort of talk about memory and the types of memory in AI, let's sort of lay them out. First, we've got a certain amount of memory in the training process, and that is learned before deployment. So it's what the AI model has learned. It's not about recalling specific documents or specific information that's been fed in, in the way it's been fed in. It's the training and the memory that is used to store the training in the models and weights. Then there's the context memory, which is akin to working, or short-term, memory that's actively holding inside, for instance, the context window that we've talked about in other episodes. It's holding that information inside. Then there's a kind of temporary storage that's session or tool memory storage. And then there's a relatively straightforward, very small amount of memory for a user profile memory. So it remembers, sort of, who we are. And there's a different amount of that that's actually retained with different models. And then there's something called, really, a model state, which is not a classic kind of memory, but it's the electrical activity that's sort of the brain thinking. So it's not persistent memory. But with all of that, let's sort of go back to talk about two things, which is RAM—which a lot of people who are familiar with computers are familiar with—and then we're going to get to something called high-bandwidth memory, or HBM. But let's talk about RAM first and sort of set the stage there, and then move into the AI-specific high-bandwidth memory, which is really what makes everything move so quickly. And so with RAM—We know that the way that computers temporarily store information that's going to be processed is with RAM, and that's, sort of, the most commonly thought-of kind of memory. And. you know, RAM is needed to hold the model weights, but it doesn't actually run the computation. That's going to be run by something that we're going to call the high-bandwidth memory—the H-B-M—HBM that we're going to get to in a moment. So, you know, we've got this RAM that is able to pull on or pull up past information, which is also often occurring in the context of agents and developers and deployers storing information about the models, like the weights, to run the model. And so we've got a, sort of, an initial piece that, sort of, is our starter pack of memory, which is our RAM.
Scott Caravello: Right, and so then, just circling back really quickly, Katherine, on one type of memory that you had mentioned when you were talking about the context window—and, sort of, hitting on why that's so important and why memory is so important right now—is because besides the fact that you need to be able to access what the model had learned in training, it's also the fact that AI agents require so much context to run and to understand what they're doing, how they planned. And so, as agents continue to get rolled out, this memory is playing an increasingly important role. And, so, then, going to what you had mentioned about the high-bandwidth memory, you actually need to be able to access that memory. And so that high-bandwidth memory, or HBM, it helps make that access happen fast and answer your queries fast.
Katherine Forrest: Right, and so I think we'll go on a little but important tangent right here on something we've mentioned in prior episodes, called the mixture of experts, or MOE, and how MOE models fit into this picture in explaining memory. And, so, we know that, for instance, Gemini 3.0 is an MOE model, and the MOE models are composed of a large set of total parameters, but the point of a mixture of experts is that only a smaller set, or a subset, of those parameters are activated at any particular time. So in an MOE model, you're just running parts of the model that are useful for an appropriate, you know, particular query or input. But there's a catch, which is even though a smaller number of parameters get activated, and there's a lower, you know, sort of pure computational cost, all of the parameters get loaded into or stored in the memory.
Scott Caravello: Right, so whatever way you splice it and dice it, there is no getting around the need for memory, especially as the MOE—mixture of experts—models become more popular.
Katherine Forrest: Right, right, right. And so there's a takeaway here, which is obviously that a lot of models have become—or are just—closed source, and developers don't always disclose the specific details about their architecture. But there are some models, like DeepSeek's models and Llama models, that are MOE, and because they're open source, you can actually learn a lot more about them. So, you know, as I've said, you know, we think that Gemini is an MOE. It's actually mentioned as an MOE, and it's become an industry standard for frontier models. So it's utilizing memory in the way that we've just talked about.
Scott Caravello: Yeah, and so as these cutting-edge models continue to get larger and as the demand grows in the inference market—you know, that is, the more people are sending model queries and using these models—we are starting to see some major headwinds in the memory market.
Katherine Forrest: Right, so let's just now circle back to the memory, which is this: there's a really interesting consumer electronics show, which everybody probably knows about if you read Wired magazine or any number of business publications. And that consumer electronics show is called CES. And at CES this year, the business chief of Micron—and Micron, by the way, as we know, is one of the major memory suppliers—said, quote, "We have seen a very sharp, significant surge in demand for memory, and it has far outpaced our ability to supply that memory and, in our estimation, the supply capability of the whole memory industry," end quote.
Scott Caravello: And that is a pretty big statement, but, you know, notwithstanding that crunch, the industry is taking steps to address the memory capacity. And, so,o last month, SK Hynix—I think? I think that's how it's pronounced—which is one of the other three major memory suppliers, just announced a $13 billion investment in a new plant while citing figures that showed a 33% compound annual growth rate in the high-bandwidth memory market. And they're also building facilities too, right? Micron is building two factories, or, also known as fabs, in Boise, Idaho, and another in Clay, New York. But even despite all of that, Micron is saying that it is sold out for 2026. And, you know, that's–that’s because these fabs do take time to be built and come online.
Katherine Forrest: All right, now let's go back to that word, or the phrase, that you used there, which was the 33% compound annual growth rate for the high-bandwidth memory market. That's the HBM that you and I were talking about before. Those are the chips that are in the GPU—the graphics processing unit—attached to memory. They are ultra-fast, and they are designed for massive parallel processing. They are not RAM, but they're sort of RAM-adjacent. AI models use both HBM—this high-bandwidth memory—and RAM. RAM sort of manages the HBM. It's sort of at the beginning and at the end. The CPU prepares the user input in RAM, and the model weights and parameters then go—they move—to the high-bandwidth memory, to the HBM, and that's where all the magic occurs, inside and with the HBM, and utilizing the HBM. Then the output gets passed back to the RAM at the end. So the thinking is happening, and the speed of the thinking is happening, inside of the HBM.
Scott Caravello: Right, and, you know, the reason that everyone knows about RAM already is because it is part of our consumer devices. And so I want to make another note on RAM, which is that RAM and HBM in the AI context are really interconnected with RAM in the consumer context too, as we talk about this memory shortage. Micron, for example, has stated that for every unit of HBM it produces, it has to forgo producing three units of regular memory. So basically the entire memory market, not just in AI, is affected by this surge in demand.
Katherine Forrest: Right, so that means if you're going to go off and— Did you ever build your own PC?
Scott Caravello: No, no, I did not. I did not.
Katherine Forrest: You did not? I didn't either, but I have friends whose kids have built their own PCs. Actually, my stepson built his own.
Scott Caravello: It’s pretty amazing.
Katherine Forrest: Yeah, it is pretty amazing, but, you know, if you're going to be doing that in the near future, you're going to find that your cost for RAM may have gone up a little bit. So if this is not relevant to you, Scott, we can just go ahead and move on
Scott Caravello: No, no, no, no—yeah, it's not super relevant to me, but I think it's really interesting to talk about because some stores are now selling RAM at market prices, like exactly how you buy lobster at a seafood restaurant. And that's an analogy that The Verge used, so I'm not going to take credit for it, but it really does work. And, you know, according to their reporting, a message in one store display case reads—"Costs are fluctuating daily as manufacturers and distributors adjust to limited supply and high demand. Because of this, we can't display fixed prices at this time," which really, really is-is pretty crazy. So, practically speaking, prices for that consumer RAM were already tripling or quadrupling as of a few months ago, and so now we're in this kind of dynamic pricing environment. And, then one other note, and then I'll turn it back over to you, Katherine. But that's just that these memory chip manufacturers have, because of this, made a hot area for investment. And Micron's stock was up 247% year over year in January.
Katherine Forrest: Okay, so I just have to say, at first I did not have any idea where you were going with that lobster analogy. And then I realized what you're talking about is that lobster is not sold typically at, like, X, Y, or Z per pound—it's sold at market rates. Okay, and that's what you're saying.
Scott Caravello: Yeah.
Katherine Forrest: You're saying that the prices are moving so quickly in different directions. They're relatively volatile, so to speak that—okay, so that it's like lobster. All right. All right.
Scott Caravello: So, do you think we can say safely that for episode 100, it's the first time that lobster has been brought up on this podcast?
Katherine Forrest: I don’t know. You know, I've done a lot of these episodes from Maine, so it's possible. I don't remember a lobster episode, so anyway, let's talk about how, you know, the research is looking and how AI is changing the dynamics of the memory market. So, for decades now, the memory industry—because we've needed memory for computers for decades, right, forever—has been driven by demand in the consumer market, as you were saying before. And you were talking before about how you need memory for smartphones and personal computers and all of that. You know, we just haven't had as memory-hungry a series of applications until AI just a couple of years ago. And now the AI is ubiquitous, it's also resulting in enormous numbers of new consumer products. So we've really got a whole change in the demand landscape for the AI tools that use memory. It's not just the models and the tools that are being used in business, but there are a whole load of toys and consumer devices and smart home devices, all of which now are utilizing memory.
Scott Caravello: And so, just transitioning off of that point, Katherine, right, because of that market impact, it's probably going to implicate other pillars of AI development too, like the algorithms themselves. Because AI developers need all this memory, there's certainly R&D being done to reduce the amount that's required by major models. So we might see other downstream shifts. And, so, what’s our takeaway, Katherine?
Katherine Forrest: Well, what's our takeaway, Scott?
Scott Caravello: It's that memory is going to be a significant bottleneck for AI in 2026, and we have not heard the last of this. We're going to see if there are tech developments—again, like I mentioned, with algorithms—to try and address it. But this memory wall that's been talked about for years is a key issue to keep on the radar.
Katherine Forrest: You see, I like your articulation of that takeaway. That's a good articulation of that takeaway. And that's all we've got time for today. Ending our 100th episode, I'm Katherine Forrest.
Scott Caravello: And I'm Scott Caravello. If you're enjoying the podcast, please like and subscribe.