Back to All
Ai/ml
Blog

What Tokenmaxxing and a $500 Million Mistake Reveal About AI Governance

Listen
What Tokenmaxxing and a $500 Million Mistake Reveal About AI Governance
6:53

Dr. Ryan Ries here. This week I want to give you the full picture on the industry’s latest buzzword: tokenmaxxing. What it is, where it comes from, and why you need to think about it for your organization.

To paint the picture, let’s start with a couple news stories that came out over the last few weeks.

Brief intermission before I jump in, here are some upcoming events I’ll be at that I wanted to share. Make sure you register asap to join me there!

The Leaderboard Problem

Amazon recently shut down an internal tool called KiroRank. It was a leaderboard tracking AI token consumption among developers on their internal Kiro platform. The intent was to measure adoption, drive usage, and promote developers to win at AI.

Instead, employees found ways to inflate their numbers (I guess can you blame them?).

Amazon isn't alone in this. Meta and Microsoft have both dealt with similar situations. If you weren’t aware, this trend is called “tokenmaxxing.”

Dave Treadwell, Amazon's senior VP of engineering, addressed it directly when he shut the leaderboard down: "Please don't use AI just for the sake of using AI. Use AI to help you solve customer problems, to help you solve business problems, to innovate."

The $500 Million Wake-Up Call 

On the other end of the spectrum, a report from Axios revealed that an unnamed company accidentally spent $500 million in a single month on Claude AI. The reason is because nobody set usage limits on employee licenses.

No guardrails and a lack of governance will do this to you.

The $ amount is shocking, sure, but something else this tells us is where even some of the world’s largest companies are in their AI maturity. Many companies are deploying AI access broadly without building the controls, accountability structures, or measurement frameworks that should come with it.

What Jonathan and I Have Been Seeing

I talked about token costs in depth on a recent webinar I co-hosted with Jonathan LaCour, CDW's Head of Cloud and AI Technology. This problem comes up a lot when we’re working with customers.

Here are my thoughts. Stop thinking about tokens like electricity and start thinking about them like raw materials on a factory floor. Every token that goes in or comes out costs something. The question is what you got for it.

The most expensive thing you can do with tokens is rework. If your agent writes a thousand lines of code and 600 of them are wrong, those tokens are gone. You got nothing from them. And the tokens you spend fixing the mess are also gone. You just paid twice for the same outcome (and that adds up quickly).

There's another piece most teams don’t think about. Input tokens and output tokens are not priced the same. Depending on the model, output tokens can cost five to ten times more than input tokens. Everyone obsesses over tightening their prompts, which is definitely important, but it's the wrong target. If you save 100 input tokens by shortening a prompt and the model still generates 10,000 tokens of wrong output because you weren't specific enough about what you wanted back, you saved pennies and spent dollars.

The real optimization lever is controlling what comes out, not just what goes in. Be explicit about expected output format. Set constraints on response length. Tell the model what you don't want. Every output token you prevent from being generated is worth more than every input token you trim.

Jonathan made a point during the webinar that I want to call out here. If your team is using agentic coding tools without a spec-driven process, you are choosing to waste money. Agents don't know which AWS services you're standardized on. They don't know your team's architectural conventions. Left to their own devices, I have personally watched these tools reach for deprecated services and make decisions that would make any senior engineer cringe. They do it not because they're bad tools, but because nobody gave them the context to do it right.

Agentic workflows compound all of this. A standard LLM query uses a certain number of tokens. An agentic workflow running autonomously can burn through 100 to 1,000 times that amount. When you combine unconstrained agentic access, no usage governance, and no spec, you have the ingredients for a very expensive month.

What This Means for You

Tokenmaxxing and runaway spending are really two sides of the same problem. On one side, employees gaming usage metrics because the incentive structure rewards activity over outcomes. On the other, unchecked spend that nobody flagged until it hit nine figures. Both happen when organizations treat AI adoption as the goal rather than as the means to a goal.

A few things worth examining in your own environment:

  • Token consumption is not a proxy for productivity. It's a proxy for activity.
  • If you're running agentic workflows, cost monitoring is not optional. Build it in from day one.
  • Pressure to "show AI usage" creates exactly the wrong incentives. What you measure drives what your team does.
  • The governance question isn't just about spend limits. It's about defining what a successful AI outcome actually looks like before you deploy anything.

My Final Thoughts

If this means AI isn't delivering value. Across the customers I work with on AWS, I see real, measurable outcomes from well-scoped AI projects every week. The problem isn't AI. The problem is measuring the wrong things.

Jensen Huang made waves recently saying he'd be "deeply alarmed" if an engineer making $500K wasn't consuming at least $250K in tokens annually. I understand the intent behind this… he wants aggressive adoption. But token consumption as a benchmark for human productivity is exactly the kind of metric that creates those leaderboard situations. 

If you want to talk through what a smart AI measurement framework looks like for your environment, reach out. Feel free to follow this link

Until next time,
Ryan

Now time for this week’s AI-generated image and the prompt I used to create it.

Create an image of me in a video game collecting as many tokens as I can. I am competing against a muppet. The video game art style is 16-bit. You should see details in the image like scores at the top, AWS service related "power ups", and the title of the game at the top "TOKENMAXXING". 

Gemini_Generated_Image_tywf23tywf23tywf

 

Ryan Ries avatar

3 minutes read