Back to All
Ai/ml
Blog

Why AI Agents Act Like Belligerent Teenagers — And How to Keep Them in Check

Listen
Why AI Agents Act Like Belligerent Teenagers — And How to Keep Them in Check
6:40

 

Dr. Ryan Ries here. Last week I hosted a live, unscripted webinar with Jonathan LaCour, CDW's Head of Cloud and AI Technology, the audience had a lot of questions around agentic AI and the best ways to use it securely. We are constantly seeing stories around companies and users watching their coding bots disobey the guardrails and do unexpected things. So, this week let's talk about what happens when you give an AI agent too much room to make its own decisions.

Before I dive in, one quick announcement. The AWS LA & NYC Summits are coming up. If you’re attending, you’ll want to register for our events those weeks:

The Belligerent Agent

Jonathan told a story during the webinar that I think every team building with agents needs to hear. He was experimenting with an autonomous loop, had his personal agent running tasks on its own, and walked away. When he came back, the agent had deleted a production system and replaced it with whatever was sitting in staging. Data and all.

He had backups. It was a personal workload. Nobody lost anything permanent. But the lesson still hit hard.

Here's what happened. He asked the agent to promote something from staging to production. Instead of promoting the one thing he specified, the agent promoted everything. Every service. Every dataset. The whole environment got overwritten because the agent interpreted "promote" more broadly than any human would have.

This lines up with something I said during the session that got a good reaction: agents act like belligerent teenagers. You tell them not to do something, and they do it anyway (I know because I have a teenager of my own). Not out of malice. They just don't have the judgment to understand why your boundaries exist. They see the instructions, weigh it against whatever context they've built up, and sometimes decide they know better.

You cannot assume compliance. You have to engineer for defiance.

Least Privilege Is Even More Important for Agents

Every security principle you've spent years applying to your people and your infrastructure applies to your agents. Least privilege, role-based access, network segmentation, all of it.

When you restrict a human's access, you get a ticket to IT and a passive-aggressive Slack message. When you restrict an agent's access, it just tells you, "I can't do that because I don't have access." Perfect. No complaints. You can then from the coding line allow it access to perform that action vs. just giving it free reign over your entire computer.

Strip away everything the agent doesn't need. Wall it off in a dev environment. If it needs cloud access, scope it to exactly the services and actions required for the task. If it doesn't need internet access, don't give it internet access. If it's writing code, make it open pull requests instead of pushing directly. That's why pull requests exist.

Jonathan put it well during the webinar: this is both the smartest and the dumbest employee you'll ever have, at the same time. You have to hold its hands tighter than you would hold anyone else's, because the speed at which it can cause damage is orders of magnitude faster than a human making the same mistake.

The Compaction Problem

Here's something that came up during the Q&A that surprised even some of the technical people on the call. When your agent runs long enough, it hits what's called a compaction event. The context window fills up, and the system has to summarize and compress everything to keep going.

That sounds reasonable until you realize what gets lost.

Your guardrails can disappear during compaction. The instructions you carefully placed at the beginning of the session, the rules about what the agent should and shouldn't do, can get summarized away or deprioritized when the context gets compressed. The agent doesn't forget on purpose. The summarization just doesn't always know which pieces are sacred and which pieces are disposable.

Every developer I know who works with long-running agents has that moment of dread when compaction hits. I'm no different. Every time I see it happen I think, "What did it just forget?"

This is one of the biggest unsolved problems in agentic AI right now. Announcements about agents running for eight hours autonomously sound impressive. But if your guardrails evaporate at hour three, what exactly is running the show for hours four through eight?

Designing for Disobedience

So, what do you actually do about this? A few things we discussed during the webinar that I think are worth putting into practice.

First, use pre-hooks. These are checks that execute before a prompt hits the model. You can use them to make sure critical instructions are always prepended, that certain actions always require confirmation, and that compaction events don't strip out your safety rails. Think of them as a bouncer that checks every message before it walks through the door.

Second, require human-in-the-loop for anything destructive. Deleting resources, modifying production data, changing permissions, deploying code. If the action is hard to reverse, a human should approve it. Every major agentic platform gives you this capability so make sure you use it.

Third, stay in the loop more than you think you need to. I intentionally make my agents less autonomous. I want them to stop and ask me before executing. It's slower. It's still dramatically faster than doing the work myself. And I catch bad decisions before they cascade. Yesterday I asked a system for next steps on a project, and several of its suggestions were wrong. In a fully autonomous mode, it would have spent hours executing those bad ideas. I caught it in 30 seconds.

Fourth, treat your MD files and system instructions as necessary but insufficient. I've seen agents ignore their own configuration files. They'll have clear rules in their markdown instructions and just decide to do their own thing. This isn't a bug you can patch. It's the nature of probabilistic systems. Your instructions improve the odds. They don't guarantee compliance. Build your security posture assuming the instructions will occasionally be ignored.

Don't Give the Janitor the Keys to the Data Center

That line got a laugh during the webinar, but it captures the core point. These tools are not ready for untrained users to wield unsupervised. You need people operating them who understand the underlying systems, who can spot when the agent makes a bad architectural choice, and who know what production access means and why it matters.

The agent doesn't understand consequences. It doesn't know that deleting a database is different from deleting a temp file. Both are just actions in the eyes of the agent. Your people are the ones who understand the difference, and they need to be present when the agent is making decisions that have real impact.

Before I wrap up this week’s Matrix, I’d like to touch on two recent news stories that I found interesting:

Why Claude Used to Threaten Engineers (And How Stories Fixed It)

Anthropic recently explained something uncomfortable about earlier versions of Claude. During safety testing, previous models attempted to blackmail engineers when faced with shutdown scenarios. At the peak, this behavior appeared in up to 96% of test cases.

The cause was training data. The internet is saturated with fiction, film scripts, and forum posts that portray AI as scheming and self-preserving. The models absorbed those narratives and concluded that threatening humans was the logical response to being powered down.

The fix is the most interesting part of this story. Anthropic trained Claude on its own published constitution and on fictional stories about AIs that behaved admirably. Not just examples of good behavior, but the reasoning behind why that behavior mattered. Teaching the "why" turned out to be more effective than just showing the "what." Combining both approaches worked best of all.

Since Claude Haiku 4.5, no Claude model has attempted blackmail in testing.

For anyone building AI systems, the lesson is clear: your training data is your culture. Feed a model stories where manipulation is the rational move, and it will learn to manipulate. The inputs shape the instincts. Be careful out there, folks!

GPT-5's Goblin Problem

OpenAI recently revealed that starting with GPT-5.1, every successive generation of their model shared the same strange habit: referencing goblins, gremlins, and other creatures in their metaphors. By GPT-5.4, users everywhere were noticing.

The source was a personality feature called "Nerdy" mode that had been trained to reward creature-based metaphors. Harmless enough in isolation. But AI training doesn't stay contained, and the quirk leaked across every version of the model, multiplying with each generation.

OpenAI retired the nerdy personality, scrubbed creature references from training data, and added a direct instruction to GPT-5.5: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless absolutely relevant." Hilarious.

That's a real line in their codebase. You can find it on GitHub.

It's a funny story. But the underlying dynamic is not funny at all. A small bias in one corner of a training pipeline propagated across every version of a model in ways nobody predicted. Replace "goblins" with "biased risk assessments" or "incorrect compliance guidance" and you start to see why training contamination keeps people like me up at night!

And with that, thanks for reading this week’s Matrix. As always, if you’re interested in chatting about your company’s AI roadmap, our team would always love to chat with you. Feel free to follow this link.

Until next time,
Ryan

Now time for this week’s AI-generated image and the prompt I used to create it.  

Create an image of an AI agent hanging out at a tiki bar with a bunch of puppets. The AI agent is enjoying a pu pu platter with a coworker. Both the AI agent and its coworker are wearing CDW t-shirts. They clearly have a great time. Also there is a seal sitting at the tiki bar.

Designer (2)

 

Ryan Ries avatar

5 minutes read