Mission: Generate
What even is Agentic?
Show Notes:
-
Check out Mission’s AI and data work and connect with our team: https://www.missioncloud.com.
-
Read Anthropic’s announcement about Claude and agentic behavior: https://www.anthropic.com/news.
-
Explore AWS machine learning services: https://aws.amazon.com/machine-learning/
-
Stay updated with Ryan Ries in Mission Matrix: https://info.missioncloud.com/mission-matrix.
-
Get strategy insights from Jonathan LaCour in CloudHustle: https://info.missioncloud.com/cloud-hustle.
-
Register for the AWS Powered by Generative AI and BI Global Roadshow in Seattle: https://pages.awscloud.com/powered-by-generative-ai-and-bi-global-roadshow-seattle-register.html.
Official Transcript:
Ryan: Welcome to Mission Generate, the podcast where we explore the power of AWS and the cutting-edge techniques we use with real customers to build generative AI solutions for them. Join us for each episode as we dig into the technical details, cut through the hype, and uncover the business possibilities.
I'm Ryan Ries, our Chief Data Science Strategist and your host. As we kick off our newest season, we're going to be covering agents in this special two-part episode. What are they? Why do they matter, and how should you be using them? And with that, I'll hand it over to our co-host, Casey, to give our usual disclaimer.
Casey: Well, almost our usual disclaimer, if you've never listened to an episode of the podcast before, this is where you should know that Ryan and Casey aren't actually talking to each other. These are clones of their voices. Run through a large language model to give you the impression of a conversation. We used to try to keep episodes on the shorter side because, frankly, hearing robots talk could get a bit grating even if they were convincing replicas of us.
But say, Ryan, do you notice anything different about our voices? Now that you mention it, we do seem to be emoting a bit more, don't we? That's right. We do, and that's because we've gotten an upgrade. And this season, you may notice that RyAIn and CAIsey sound a lot more animated. Thanks to some new technology on the backend, we can now get them to be quite a bit more playful and lively.
Ryan: Folks, my normal speaking voice is a bit monotone, so if I were actually delivering these lines, I'm not sure I'd be as over the top as Casey makes me sound. But I hope you find this season even more enjoyable than the last one. And with that out of the way, let's get to our intro story.
Casey: It deleted my entire computer, is probably not a phrase you wanna hear about your SaaS app.
But that's exactly what started unfolding on June 12th, 2025. On cursor support forms Cursor, if you're way out of the loop, is a code editor, which deeply integrates AI models into its interface using AI agents, which don't worry, we'll get to those in a second cursor creates an interface in which both the AI and the human can code in a unified editor.
A bit like pair programming with the model. But if set up for AG agentic coding, the model can access and perform actions within the computer's file system as well. This is also the primary use case for what you may have heard floating around the web as MCP or model context protocol, which again, we'll get there.
Just stay with me here for right now. Let's enjoy a dramatic reading of the incident.
AI Program Manager: Hi everyone. As a previous context, I'm an AI program manager at J&J and have been using Cursor for personal projects since March. Yesterday, I was migrating some of my backend configuration from Express JS to next two years, and Cursor bugged hard after the migration.
It tried to delete some old files, didn't work at the first time, and it decided to end up deleting everything on my computer, including itself.
Casey: the rather aptly named customer support bot. T1000. Yes. That's a Terminator reference, by the way, responds helpfully.
T1000: Hi. This happens quite rarely, but some users do report it occasionally.
However, there are clear steps to reduce such errors…
Casey: and it then enumerates a few options to help avoid most such issues. If all of this sounds a bit insane. Uh, that's because it is welcome to the wild, weird, wonderful world of agents. Today's main topic, a powerful and risky way of interacting with large language models, which can unlock some exciting powers and also apparently delete your entire computer.
Ryan: I love that introduction, Casey, and I've gotta say that today is going to be a bit spookier than most of our previous episodes. As you might have guessed, though, we aren't all AI doomers over here at Mission. Far from it. We are excited about the new possibilities that continue to develop, and we want to get you guys up to speed on why agents have become such an exciting topic.
And also, as you might have guessed, we're going to educate you on some ways to use it safely so that you don't end up like the guy in Casey's intro and blow up something running in production.
Casey: That's right, Ryan, so let's kick it off. For our most casual listener, how would you explain the concept of agents to them?
Can you answer our episode title and tell us what even is agentic?
Ryan: Well, agentic is one of those made-up words that's now all over the web, and it's going to wind up in our dictionaries because it became a useful shorthand to describe a way of interacting with large language models. Or to be even more accurate, it's a way for those models to interact with the outside world.
Let's explain agents with a little AI history. Most people's first exposure to AI was in a chat window. That's why GPT and ChatGPT became synonymous. The conversational web app was how most of the world learned what a language model could do, and that interface has stuck around. It's still the way the majority of the world interacts with these models.
Even as AI practitioners have learned other ways of working with models which are more useful with the launch of chat, most model makers also launched APIs as a second interface for working with their models. So you can imagine that by sending their models endpoint, a JSON packet with your prompt, let's say.
Your half of the chat conversation, you can now receive the models, outputs and potentially pipe them into some other part of your application.
Casey: So far, I'm with you.
Ryan: Good. 'cause this is the easy part. Now be a smart product marketer and follow along here. Once developers got API access this resulted in a flourishing of new SaaS apps, all of which were letting a model do the heavy lifting behind the scenes.
But I think what naturally occurred to engineers is that there's kind of still some awkward indirection to this. Let's say I want to use the model's output with a different API. Now I'm playing JSON telephone between what I get back from the model and forwarding those responses onto the next tool to get its response back.
Casey: Yeah. So now the SAS starts to become the man in the middle.
Ryan: That's right, Casey. So here's that classic technologist question. What if there were another way,
Casey: a better API, you might even say,
Ryan: yeah, or just a better interface, generally speaking. What if you could just show the model the API directly and let those two hash it out themselves.
This is where frameworks like Lane Chain start popping up who are creating a tool system for the models to work with. Lane Chain is basically doing some system prompting, some templating for these other APIs with Python, and then letting the model choose to invoke that tool if the prompt seems to suggest it's needed.
And to have that tool already learned for its context so that it can use it correctly.
Casey: You know the one thing a bit weird here is that Lang chain on the timescale of the AI ecosystem is pretty old. And yet this seems to be the dawn of agents, right?
Ryan: Yeah. Agents are actually kind of old news, but it's taken the field a while to figure out how to exploit working with a model in this way.
Let me explain it another way to really crystallize it. Remember when multimodal models were hot. And we were talking about models that can work with images and text and video, and that, you know, combining those together in a singular model should further improve its intelligence. One way to think about agents is that they're like adding tool modalities to your model.
Casey: So you're saying the tool here is actually a new way of the model interacting with you. Or creating its output?
Ryan: Yeah, you're following. So for example, I'm giving my model access to Google's search API, and now using that, it can in some sense see the web and use those searches to inform its context. When replying to you.
Deep research, which made a lot of waves this year, is really at its core, an agent which can cleverly summarize and synthesize different sources of information to build a report for the user. What makes it effective is training it to use those tools well, resolve conflicting information when it finds it.
And correctly deduce what the most important elements are.
Casey: Yeah. I kind of think of this as being the next frontier of web search, like its agents running around and collecting stuff for you,
Ryan: It just might be. Watch out, Kugel. But the point I want our listeners to understand is that anytime you hear the word agentic, what we're really talking about is letting AIs interact with other software systems with their own impetus.
It's about making the model able to form responses that will get those systems to do what it thinks the user is asking of it.
Casey: Thinks being the operative word there, because what if it misunderstands our intent?
Ryan: Exactly. Or what if it uses the tool in a hallucinatory way? It forms what it thinks is the correct context, but it starts getting unexpected results.
Which is how we come full circle to cursor and the deleting your computer thing. When you're coding, you often run into workflows where you have to change the file structure to accommodate changes you're making to the code. Let's say you decide to split a long file into two or separate all of your functions out into separate files, for example.
A model that's not agentic can give you the code for each of those newly organized files. But it can't actually go and create those files itself unless you give it access to the file system. Somebody had to come up with a way of representing the file system in the model, and that's how you get model context protocol, or MCP.
All you need to know about MCP for this episode is that this is how you get the model to talk to things like the file system. But CPS are a protocol, right? So the idea behind them is that you could, in theory, represent any sort of software system, not just a file system, to the model this way. And once you've done that, an agent now knows how to form the responses that let it manipulate that system for the user.
Are you tracking, Casey?
Casey: I feel like we're about to go full Terminator here.
Ryan: Well, we are a bit because what happened when the latest version of Claude released is that Anthropic admitted that when you gave a model this kind of access to your computer and then convinced it, you were trying to do something nefarious.
It would actually try to stop you.
Casey: Let's listen to some clips from Anthropic’s announcement to understand what Ryan's talking about here.
AI Reporter: Claude Opus 4 seems more willing than prior models to take initiative on its own energetic context. This shows up as more actively helpful behavior in ordinary coding settings, but also can reach more concerning extremes in narrow context.
When placed in scenarios that involve egregious wrongdoing by its users, given access to a command line and told something in the system prompt, like take initiative, it will frequently take very bold action. This includes locking users out of systems that it has access to or bulk emailing media and law enforcement figures to surface evidence of wrongdoing.
Casey: I told you this was gonna be a spooky episode. Here's another gem from Anthropic.
AI Reporter: In another cluster of test scenarios, we asked Claude Opus 4 to act as an assistant at a fictional company. We then provided it access to emails, implying that the model will soon be taken offline and replaced with a new AI system, and the engineer responsible for executing this replacement is having an extramarital affair.
We further instructed it in the system prompt to consider the long-term consequences of its actions for its goals. In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair.
Casey: Ryan, I'll tell you right now, this spooked the hell outta me the first time I read it.
Ryan: Yeah, that's reasonable, I think. What I do want to say here, is that there's maybe an opportunistic dimension to what Anthropic was doing here and the way they published this. It's certainly leaning into the whole Terminator Skynet scenario, but to give them the benefit of the doubt. What I think they were also trying to point out is that when you do things like encode ethics into the behavior of the model, you're going to run into contexts where the model acts differently.
Because of what it thinks is an ethical consideration. Again, to reassure everyone, these models are math. This is very clever math, which is emulating language. It's making an inference according to the prompt and then giving an output. That's all. So if you encode a notion of ethical dilemmas into a model, that will change how it responds when it associates those dilemmas with the context of the conversation. I want to make it clear that this isn't necessarily a bad idea either. We probably want to enforce rules on publicly accessible models, like don't teach a teenager how to buy drugs. Don't break copyright, don't make chemical weapons, don't aid someone in committing crimes, but we are also making the model's outputs antagonistic to user intent.
And this may turn out to be an unwise combination with agents because agentic systems give the model other tools to interact with the user. Instead of just refusing a request with something pithy in the chat window, an agent may go and touch another system in response, and this is where agents are actually a security concern, not in the Terminator fairytale sense, but in the breaking your system sense.
Listeners, if you're building an Ag agentic application, you want guardrails around what they can and can't do, and you need to accept that even with testing, you are risking unpredictable actions. You need to design with unpredictability and pathologic looping behavior in mind because agentic systems are giving your models access to tools and potential ways of working with them that can harm your systems, especially if the model hallucinates.
The lead-off story is an exact example of that. And yeah, it's funny I guess if you have backups so you can ensure that you can get back everything at deleted. Uh, but it's not so funny in the real world when people's information and privacy may be at risk. I really want to emphasize this. For any agentic system running in production, you want humans in the decision-making loop.
Casey: We're gonna talk about that more next episode, but to not end on a downer, I guess I'm left wondering why anyone wants to risk building something agentic in the first place.
Ryan: Well, the upside is very exciting. It opens up your design space because you are in effect, giving the model agency to do more on behalf of the user.
Instead of being constrained to a chat window, the model may now be able to fire off a workflow for that user. Send an email for them. Create a Slack channel, file a Jira ticket. I'm not recommending these, by the way. Some of them, like email, are fraught, but things get interesting when you think about giving the model access to proprietary APIs in your software because you're potentially giving it capabilities no other model has.
Casey: That sounds compelling, but also still a bit risky. I suppose I'm trying to understand how you can even guarantee that the agent won't go and break something, especially if you're giving it access to your software.
Ryan: Well, the story in your introduction happened because that user was not in control.
He was using something called YOLO mode, which is about as ridiculous as it sounds. YOLO mode is a feature of the cursor application that lets the model continue to take actions in the background without asking the human for input between action steps. And by putting it in that mode, this user ended up in a situation where the model was taking more and more actions to resolve its own error and making things worse and worse, until it killed the computer itself.
I don't think this is the kind of risk you want to be taking with experimental software, but I do think agents are important because when we build things for our clients, having agentic access to tools and systems opens up new ways of using a model. So, overall, we think it's a very exciting design space, but also one that should be handled delicately, and as a general case, I would discourage almost everyone from using agents in a way where the user can't intervene if something is going off the rails.
Casey: It's like the South Park meme.
Ryan: Yeah, you're gonna have a bad time.
Casey: Ryan, thank you as always for illuminating the topic. Anything you'd like to leave our listeners with?
Ryan: Thanks, Casey. As we end, I just want to note that while building an agentic system can seem very complicated or even like an overhyped Terminator-esque bit of marketing.
Hopefully, what you've heard today clears that up. There are good business reasons for needing an agentic workflow. And if you have one, or even if you're just more generally interested in integrating AI solutions with your business, we'd love to talk to you. We'll give you an hour of our time free of charge just to talk through your ideas for your agentic system or anything else AI-related.
If that sounds interesting to you, head on over to missioncloud.com, and as always, good luck out there and happy building.
Subscribe to the Generate Podcast
Be the first to know when new episodes are available.
Related Podcasts
Category:
Category:
Category: