Podcast Banner

Podcasts

Paul, Weiss Waking Up With AI

The Increasing Importance of Knowing Your Model’s Provenance

In this episode of “Paul, Weiss Waking Up With AI,” Katherine Forrest explores the importance of understanding AI model provenance—where models come from, how they're built and why tracking their history is essential for safety, liability and regulatory compliance, especially as agentic AI systems become more prevalent.

Stream here or subscribe on your
preferred podcast app:

Episode Transcript

Katherine Forrest: Hello, everyone, and welcome back to “Paul, Weiss Waking Up With AI.” I’m Katherine Forrest and we don’t have Anna Gressel with us today, so you’re going to have to again just suffer with me. But she’ll be back. She’s coming back. She’s just got some things that she’s taking care of and we’ll have her back very soon and that’ll make things even more exciting. But in the meantime, what you’ve got—and you folks, you know, you’re sort of just listening to me, you can’t actually see it—but my background is now no longer Maine. I’m actually in Woodstock, New York, which is my sort of non-Maine, non-New York City place that I go. So I’m here often when I’m recording. And so that’s where I am. So you’ve got me in Woodstock, New York, and so I am still enjoying coffee, although not the incredible coffee that they have in Maine.

But speaking of places, this brings me to the topic I have for today. And the topic that I wanted to talk about, and I really wanted to talk about this for some time, is where models come from. And I want to talk with you today about why that is really important. I got this idea for the podcast when I was talking to actually a number of clients about risks with certain AI models. And a few things kept coming up again and again, particularly now that we’re in this brand new world of agentic AI. And there are questions about what a company needs to understand and to know for its own basic, really safety and liability about its models, about where they came from. And then also there’s the second requirement of some procurement issues that are part of now the White House’s AI Action Plan. And so I wanted to go through those because they really relate to one another.

So the first thing I thought I’d do—and by the way, let me just tell you, I saw this person on the subway, the New York City subway the other day, and she actually had the word “nerd,” N-E-R-D, tattooed across her fingers. There was like an N, an E, an R, and a D on each of the digits. And I thought, “well, I’m a nerd,” but I’m not sure I’d go as far as tattooing it across my fingers because what if I one day became a non-nerd? But anyway, this episode is going to demonstrate for you all what kind of a nerd I am. So let’s start with the basics of what a company needs to know about the provenance of its tools. And that’s really true regardless of compliance obligations because we all know right now that there are a variety of tools marketed by a variety of vendors, some of which are well-known and almost becoming kind of brand names. Then also every single day there’s a new tool. And some of these tools are going to become well-known even though they’re brand new today. Some of these tools that are just right now, new unknowns are going to become the best known tools in a few years, because we don’t know who the winner of these AI races for any particular domain is going to be just yet. So we’ve got these tools marketed by these vendors, and there are certain questions that everybody needs to ask.

So first of all, the tools can be made in a variety of ways. And one major way is what I have referred to in prior episodes as my Lego block example. And, you know, you’ve got this sort of base model LLM and then you have sort of a fine-tuned model on top of that, sort of like a Lego block, one sitting on top of the other. And so you can also have as that base model either a closed source model, a proprietary model or an open source model. And an open source model can actually originate with one company that released the open source code, but the base model, the base open source model, can go through a variety of hands before it actually gets to the vendor who may then build on top of it.

So you’ve really got to understand a fair amount, not just about the capabilities of the model and the models that you’ve got, but also the risk profile of that base model and then of the fine-tuned model to understand what you’re looking at as a totality. And so you want to understand, for instance, when you get a model or license a model from a vendor, what is the base model? And where has it been and where is it going to, so to speak? And what kind of testing has it been through? And what can you expect of the model in terms of updates? Are they going to be pushed through to you? Are you going to have an option to determine whether or not yes, no, you do or do not want to have the update because it could change the environment in fundamental ways?