
Podcasts
Paul, Weiss Waking Up With AI
Recent Discussions of Agentic Security
This week on “Paul, Weiss Waking Up With AI,” Katherine Forrest and Anna Gressel dive into Google’s latest paper on AI agent security, examining the unique risks posed by autonomous AI systems and the layered defenses needed to keep them safe.
Episode Speakers
Episode Transcript
Katherine Forrest: Hey, good morning, everyone, and welcome to another episode of “Paul, Weiss Waking Up with AI.” I'm Katherine Forrest.
Anna Gressel: And I'm Anna Gressel.
Katherine Forrest: And before we even start, Anna, as usual, we've always got a little something to talk about. I just wanted to tell you, I'm in Maine right now. And so I'm sitting here with not one, but two choices of coffee. I have one that my daughter just brought me from Scratch Bakery in South Portland, which is an extraordinary bakery. And so I've got a hot latte here. And then I have my moose here, which is my regular coffee. So I am fully caffeinated and ready to go.
Anna Gressel: I have one coffee in what someone recently spotted—and I was very impressed—as a mug from the Botanical Garden.
Katherine Forrest: The Brooklyn Botanical Gardens?
Anna Gressel: No, this is a New York Botanical Garden.
Katherine Forrest: I didn't know there was a New York Botanical Garden.
Anna Gressel: In the Bronx?
Katherine Forrest: Oh, I was thinking of the Brooklyn—I guess the Bronx Botanical Garden. I've only ever done the Brooklyn Botanical Garden.
Anna Gressel: Mm-hmm.
Katherine Forrest: Well, there you go.
Anna Gressel: Oh my gosh.
Katherine Forrest: There's a whole world to be explored.
Anna Gressel: We're going to take a little retreat there.
Katherine Forrest: Right, right.
Anna Gressel: An AI retreat to the Botanical Garden together. It's like one of my favorite places in New York City.
Katherine Forrest: That's great. Well, glad that we're hooking up in the same time zone, though I realize that our producer, poor Juliana, is off in—she's in California, she's in LA, and so this is at an ungodly hour for her, but she's so gracious about it.
All right, and today we're going to take a deep dive into a topic that's really becoming central to the future of technology, and that's the security of AI agents. And Google has done a very recent paper that's really well worth everybody taking a look at. It's called “An Introduction to Google's Approach to AI Agent Security.” And we're going to be unpacking that paper and exploring some of the unique risks that these agentic systems pose and the layered defenses, or the sort of multiple lines of defense, that Google is advocating for. So, Anna, before we get to all of that and get into the details, let's maybe set the stage a little bit and remind our listeners a bit about AI agents and why security is such a hot topic now.
Anna Gressel: Yeah, definitely. AI agents are one of the most important topics right now. And we’re—we've said this on prior episodes—we're doing roundtables and a lot of CLEs on them for clients. And really it's because there's some real there there, there's stuff for legal departments to be thinking about as they have their procurement departments begin to bring in technologies into the company called AI agents, because they're different. They are meaningfully different from generative AI. And this leap really focuses on the components of agentic technologies that can do really different things than we had with traditional large language models.
And that included things like perceiving their environment, making decisions and actually taking autonomous actions to achieve goals provided by the user. So I love this example. This is from one of our associates, Rana, but she was like, “all right, imagine a digital assistant that doesn't just tell you the weather, but it books your flights and it manages your calendar, and it can even control the smart devices in your home.” And so you can imagine we're moving towards a world of these very integrated, agentic-based systems that can do all kinds of things for you. But one of the points that we've been making repeatedly in the roundtable is that we're just kind of uncovering the risks of agents as they're being developed. And one of those risk dimensions is around security, because these kinds of agents introduce a whole new set of security challenges that we haven't really had to deal with before, and really smart people are thinking about. And so we're going to dive into that today.
Katherine Forrest: Right, and the more capable and independent these agents become, the greater the risks can be if something actually goes wrong. And that's one of the things that the Google paper highlights. And they look at, really, two key risks: one, when you've got rogue actions—and that's where the agent is doing something that's really unintended or harmful or both—and two, sensitive data disclosure, where private information can be leaked or in some way misused in a manner that it's not supposed to be used by either regulation or firm policy. And these aren't theoretical concerns.
The unpredictability of these AI models and their ability to interact with external systems—the outside world—and the complexity of their decision making, all of this makes the kinds of traditional security approaches that we're used to challenging. And depending upon what's happening, what the risk is, and what the security system that is imposed is, it can actually make it insufficient. So these AI agents require us to think about things in a new way.
Anna Gressel: Yeah, and it's not just that there are vulnerabilities in the agents. It's that agents are actually out there navigating virtual and real environments. And so they operate and interact with all different kinds of data from the real world that might be unpredictable or untrustworthy. So, for example, an agent might process an email or a document or a web page that contains hidden instructions designed to trick it, and that's what's known as a prompt injection attack. And it's worth noting, prompt injection attacks are not new. That's not like an agent-specific risk, but this is a real risk when agents or robots or other kinds of AI are in environments where malicious actors might be able to really put data designed to interfere with the agent or the robot. And this is research going back to Ian Goodfellow actually in 2017, I think, where he showed and other people showed that prompt injection attacks could happen even by putting malicious stickers on stop signs that would stop autonomous vehicles from recognizing that there even was a stop sign. So you can see this is a big issue. And then really, part of the work around this is making sure that the agent or the robot can distinguish between trusted commands and trusted data, and then, on the other hand, potentially malicious data. Otherwise, it can end up taking actions that are harmful or which violate policy.
Katherine Forrest: You know, I could tell that you were having a hard time with prompt injection attacks before you've had sufficient amounts of coffee.
Anna Gressel: You have had two coffees.
Katherine Forrest: I have my choice between the two coffees. But I want to do one more little sort of side on the prompt injection attack because that actually could be a real issue here, which is, it's a phrase that gets used a lot, as you were just describing. And one of the ways I like to think about it is: we all know what a prompt is. A prompt is, you know, entering something, for instance, into the task bar for, say, a large language model chatbot that you're using, whatever type it is. Let's just call it one of the GPTs. And so you say, you know, “can you write an email for me that does X, Y or Z?” And that's the prompt. And so the injection attack piece is where you're using the command—which is what that is—you're asking it to do something. You're actually putting in something that's asking it or commanding it to do something that otherwise it would not normally do. And you can do that in a test environment where you can have a prompt injection attack trying to see whether or not the model will do something that it's not otherwise supposed to be doing. Or it could be done in a serious and somewhat malicious way. And so that's how I think of them. You agree with all that?
Anna Gressel: I agree with all of that.
Katherine Forrest: All right.
Anna Gressel: So I think, Katherine, it would be great to dive in a little bit to why agents are at the forefront of this security debate right now.
I mean, we've talked a little bit about the fact that there are high consequences, that they interact with the world. But let's talk about why agents are different than traditional software and why that might change the security paradigm.
Katherine Forrest: Right. Well, traditional software security can sometimes rely on what are called deterministic controls. And those are rules and restrictions that are actually written into the code and they're predictable and they're testable. And AI agents don't operate in what is necessarily a predictable environment. That's the whole point, is that these agents can take an issue, a problem, an instruction that they've been given that can have multiple parts to it, and they can break it down and then they can act autonomously. So the agents are acting in a much more dynamic environment, and they can be tricked sometimes by these prompt injection attacks or they might simply misinterpret an ambiguous user command in some way and then start to do something autonomously that actually requires a new and non-anticipated—or not anticipated—way of securing what that agent is up to. So that's why this paper by Google is pushing for what's called a hybrid defense-in-depth approach.
Anna Gressel: Yeah, I love that. I actually think this paper is a really good read for folks who are kind of working at the intersection of AI and security, or even people who are just interested and curious or thinking about what kinds of questions they may want to ask to their vendor risk questionnaire on agents. So it's like a really good paper to dive into.
But I want to break down this defense-in-depth approach because it has two important prongs. One is really kind of building on the traditional approach to security, and then one is taking a more adaptive approach to security in the agentic space. So what does that mean? The paper basically describes a strategy that combines two layers of defense. The first is traditional deterministic controls, and those are kind of like policy engines that sit outside of the AI model and enforce hard limits on what the agent can or can't do. This is a lot like MLOps and the kinds of supervision we had in traditional machine learning systems. So, for example, if an agent tries to make a purchase over a certain dollar amount, the policy engine can block it or require explicit user confirmation. Those controls are requirable and auditable, but like many different AI system controls, they really can't handle every nuance or context.
Katherine Forrest: Right, exactly, and that's why the second layer here is reasoning-based defenses. And Google leverages the AI models themselves to detect and to respond to potential risks. And this includes adversarial training, which is teaching the model to recognize and ignore malicious intentions—sort of an adversary coming at it—and also using something called specialized guard models to flag suspicious behavior. And these are reasoning-based defenses, and they're more adaptable and can handle evolving threats because they're attempting to reason out what's happening, but they're not foolproof. And that's why the paper insists that you've got to actually have multiple layers, multiple lines of defense, if you will.
Anna Gressel: Yeah, and I also like that the paper kind of sets out some core principles for agent security, which is a great starting point for this discussion around, like, how do you implement agents in practice in a responsible way? So first, the paper says agents must have well-defined human controllers. And that means every action that agent takes should be attributable to a specific user, and critical actions should require explicit human approval. They're basically saying, you know, a human should stay in the loop somehow on the most important agent actions, and there's a lot of design and thinking that needs to go into that. But it's important to underscore accountability and to prevent agents from acting autonomously in critical and irreversible situations without oversight. So that's kind of point one.
Katherine Forrest: Right, and the second point is that agent powers can—and they suggest must—be limited, and that agents should only have the permissions or the authorizations that they need for their intended purpose, and that those permissions should be dynamically adjusted based on context, meaning adjusted as they go, based on what the agent is actually doing. So, for example, an agent that's designed to help with research shouldn't have the ability to modify financial accounts. And this principle extends to the traditional concept of what's called least privilege, which is a software concept of permissioning, but adapts it for the dynamic and sometimes unpredictable nature of these AI agents.
Anna Gressel: Yeah. And third, I think is a really important point, which is agent actions and planning should be observable. So the Google paper suggests that robust logging and transparency are essential for not only having trust in these tools, but also undertaking appropriate debugging or incident response. So if something goes wrong, you have to be able to kind of deconstruct it and reconstruct it and understand its decision-making process in order to respond appropriately to the incident, to contain it and to be able to, for example, interface with regulators. So this is a big, you know, huge field of work in incident response. And that's their observation.
And the paper really emphasizes the need for some sort of secure, centralized logging system and for appropriate user interfaces to make the agent's actions and reasoning transparent to users. And we know this is a big area of work right now for many companies to think about the appropriate auditing mechanisms for agents.
Katherine Forrest: Right, and let's talk a little bit more about that technical side. Google's in-depth approach, the sort of multiple lines of defense, it starts with what they call a runtime policy enforcement step. And this is the first line of defense. And the policy engines that intercept and evaluate every action that the agent's trying to take can actually block or allow or require user confirmation based on predefined rules. For example, they might block any attempt to send an email externally if the agent has just processed data from an untrustworthy source.
Anna Gressel: Yeah, and these kinds of deterministic controls are going to sound really, really familiar to folks who are working around security generally, kind of enterprise security, but also security around AI models where we're often seeing attempts to kind of block inappropriate or incorrect or malicious prompts from even getting into the system or responses from coming back. So that's kind of where we already have fairly robust architecture generally. But Google really says these deterministic controls have their limits, even if they're AI-based. So this is where their emphasis on reasoning-based defenses really comes in. And they try to use the AI's own reasoning capabilities to spot potential attacks or potential areas for misalignment. And we've talked so much about misalignment recently. In the agentic space, it's a huge, huge area of concern.
And so, for example, adversarial training that is done might expose the model—and here we mean the agent—to a wide range of attack scenarios during development, and that would help teach it to recognize and potentially ignore malicious instructions. There may also be specialized guard models that act as classifiers, and again, this is something we already see in the AI—kind of traditional gen AI—space, but we'd want those guard models to appropriately scan for suspicious patterns and inputs and outputs that are kind of specific to agents and might indicate that they're not actually doing what we expect or intend them to.
Katherine Forrest: Right, and another really important aspect is agent memory because most agents maintain some form of memory to understand context and to hold on to context across interactions. And what we mean by context is if you're involved in, say, a chat conversation where you ask your chatbot to do X, Y or Z, but then you continue and it does one part of it and you then ask it to do something else. It can keep all of that context in its memory and use it to inform its actions, but it also is able to maintain the context of what it's done. So it maintains this context across interactions and remembers the user's preferences, but the memory can also become a vector or a line for persistent attacks, for malicious attacks. So if you've got malicious data that's stored in memory, it could influence the agent's behavior and future interactions. And so where there are multiple users, the paper stresses the importance of the agent knowing which user is giving instructions, applying the right permissions for that particular user and keeping each user's memory isolated from that of other users. And I think that is really a fascinating concept.
Anna Gressel: Absolutely. And another one that actually just really jumped out at me because—I mean, you think a lot about the substance of an output, but we don't always talk about how the output is rendered by the GenAI model. And basically what Google says is if the application that displays the agent's output doesn't properly sanitize it or manage its content, it could be vulnerable to attacks like cross-site scripting or data exfiltration. And data exfiltration, for example, is when an attacker can go in and actually use the interface of a generative AI model to obtain the data that was used to train it or run it. And so it's about kind of taking the data out in a way that is inappropriate. So what is really, really interesting here is that the paper recommends processes for actually sanitizing and validating that output before it's even displayed to users, so as the rendering takes place. And this might also just have a really important operational overlay, because if you think about an agent, they're not just displaying content to users—they may actually just be actioning it in the real world. And so if that content is wrong, or kind of maliciously interfered with, it might get immediately actioned. So this is really an important point here to make sure that the agent's actions are what you intend them to be and they're not being interfered with at this kind of output rendering level.
Katherine Forrest: Right, and that's why it's not just about technology. The paper reiterates the real importance of having a human in the loop, as you had said earlier, and emphasizes the need for really continuous testing, vigilance. And you can do that in a variety of ways, but they want to make sure that you've engaged external security researchers and have done that. There's a variety of types of exercises that you can put things through, but that you have not left the humans to the side.
Anna Gressel: Yeah, you know, just taking a step back, it's great to see this kind of work from Google. AI, machine learning, generative AI security has always been a tricky area. There's always been good research on it, but really turning that into kind of a robust set of defenses has traditionally been hard because the technology is moving so fast and because it can be so, you know, interactive and out there in the world. But what Google is doing here is really trying to kind of come up with a playbook of different approaches that companies can think of and consider—and also researchers can consider—as we embark on this whole new journey towards agentic AI. And so I would expect to see a lot more in this area as we go forward. And expect to see security departments asking interesting questions about whether the agents they're procuring or deploying are really sufficiently equipped with the kind of adaptive model-based defenses that Google is arguing for here. So it's kind of the beginning of a new discussion, not to underrate any other amazing research that's going on, but I love to see this kind of work out there in such an accessible manner.
Katherine Forrest: All right, and so for those of you who may have missed the name of this paper that we've been talking about the entire time, let's give it to you again so you can look it up. You can get it easily off of the internet, and it's called “An Introduction to Google's Approach to AI Agent Security.” And that's all we've got time for today, folks. I'm Katherine Forrest.
Anna Gressel: And I'm Anna Gressel. Be sure to like and share the podcast. And if you have questions or you want to suggest a topic, we are all ears. Drop us a line. See you next time.