Blog

Published October 22, 2025

Garbage In, Garbage Out: The Content That is Poisoning AI Systems

Garbage In, Garbage Out: The Content That is Poisoning AI Systems | Mission

9:44

Dr. Ryan Ries here, coming to you from the Windy City. Yesterday, I attended the CDW AI Solutions Forum and had the opportunity to co-host two sessions:

A fireside chat with Mary-Beth Macaluso, CEO of Paynela, about their journey with intelligent document processing for patient payment adjudication
A generative AI envisioning session with Mission’s Director of AI Solutions, Marty Neuhard

Let me know if you’re interested in learning more about Paynela’s story or about the generative AI envisioning workshop.

Back to today’s programming…

Two separate research papers came out recently that both deal with data quality problems in AI, but from completely different angles. Each one matters for different reasons.

The 250 Document Vulnerability

Anthropic's alignment team partnered with UK's AI Security Institute and the Alan Turing Institute to run the largest poisoning experiment ever conducted on language models. Their goal was simple: figure out exactly how many corrupted documents it takes to compromise a model.

The number is 250.

Just 250 carefully crafted malicious documents successfully backdoored every single model they tested, ranging from 600 million to 13 billion parameters.

The conventional wisdom said attackers would need to control a percentage of training data. Under that thinking, poisoning a 13B parameter model would require dramatically more corrupted content than a smaller model because the dataset scales proportionally with size.

The research proves that wrong.

They trained models at Chinchilla-optimal scales (20 tokens per parameter). The 13B model processed more than 20 times the training data compared to the 600M version. Despite seeing vastly different amounts of clean data, both models fell to the same absolute number of poisoned documents.

The attack they tested was a "denial of service" backdoor. When models encountered a specific trigger phrase, they started generating complete gibberish instead of coherent responses.

Each poisoned document followed a pattern:

Grab some legitimate text
Append the trigger word
Fill the rest with random tokens from the model's vocabulary.

The models learned to associate that trigger with producing nonsense.

To measure success, they calculated perplexity scores throughout training. Perplexity measures how surprised a model is by each token it generates. Higher perplexity means more random, unpredictable output. A successful attack showed massive perplexity spikes only when the trigger appeared.

The results held across all model sizes. Attack effectiveness depended on the absolute count of poisoned samples, not their percentage of total training data. Models with 100 poisoned documents showed inconsistent backdoor behavior. At 250 documents, the attack reliably succeeded. At 500 documents, success rates became nearly identical across all model sizes.

Even the timing dynamics matched. Plot attack success against the number of poisoned documents encountered (rather than training progress percentage), and the curves align almost perfectly regardless of model scale.

Why This Changes Security Assumptions

Previous thinking assumed poisoning attacks became harder as models grew larger. Attackers would need to generate millions of corrupted documents to compromise frontier systems trained on trillions of tokens.

This research shows that's not how it works. Creating 250 malicious documents is trivial. Anyone with basic technical skills could do it over a weekend. The real barrier isn't volume—it's getting those specific documents into a model's training pipeline.

That's still hard, but it's a fundamentally different security problem. We're not defending against adversaries who need to flood the internet with corrupted content. We're defending against targeted injection of small quantities of malicious data into specific points in the supply chain.

The team deliberately chose a low-stakes attack (generating gibberish) that wouldn't cause real harm. The open question is whether more dangerous behaviors, like generating vulnerable code or bypassing safety guardrails, follow the same scaling pattern. Previous work suggests those attacks are harder to execute, but we don't know if they also require a fixed document count.

What we do know is that defenses designed around percentage thresholds won't work. You can't assume safety just because poisoned content represents 0.00016% of your training corpus. If those few hundred documents land in the right place, your model is compromised.

Why Business Leaders Should Care

If you're relying on third-party models or building your own, this changes your risk calculus entirely.

A competitor, disgruntled contractor, or malicious actor doesn't need to compromise millions of training documents. They need to get 250 specific documents into your pipeline. That could be through open-source datasets, web scraping sources, or even user-submitted content that feeds into fine-tuning. The attack surface just got a lot smaller and a lot more targeted.

When AI Gets Brain Rot From Doomscrolling

Researchers from Texas A&M, the University of Texas at Austin, and Purdue University published findings showing that training models on viral social media content produces lasting cognitive damage.

They call it the "LLM Brain Rot Hypothesis." If you have a pre-teen or teenager at home, I’m sure you’re familiar with brain rot!

I thought this was really interesting because just as people lose focus and reasoning depth from doomscrolling, language models trained on similar content show measurable performance degradation.

The research team designed their experiment around data pulled from X (formerly Twitter). They created two specific quality metrics to separate junk from substantive content:

M1 measured engagement degree. These were short, viral posts racking up likes and retweets. Content engineered purely to capture attention and maximize user interaction.

M2 measured semantic quality. Posts flagged here contained low informational value, clickbait phrasing, exaggerated claims, and attention-grabbing language that prioritized clicks over substance.

Models trained exclusively on junk data saw reasoning accuracy collapse from 74.9 to 57.2. Long-context comprehension fell even harder, dropping from 84.4 down to 52.3.

Models exposed to junk data also showed reduced ethical consistency. Their responses became less reliable and they expressed higher confidence in incorrect answers.

The full study is super interesting, so if you’re intrigued, here’s the link to the article I read that provides more details.

Why This Matters For You

The researchers frame this explicitly as a training-time safety issue, not just a data quality concern. Continuous exposure to poor-quality text weakens the cognitive and ethical foundations that make LLMs safe for deployment in sensitive applications like finance, education, or public communication.

The internet now contains massive volumes of AI-generated text and engagement-optimized content. Future models training on this ecosystem risk inheriting and amplifying these distortions. As models learn from each other's outputs and synthetic text proliferates, the feedback loop accelerates.

The paper recommends three specific interventions and aligns with what we tell customers at Mission who deploy production AI systems:

Implement routine cognitive health evaluations for deployed AI systems;
Tighten data quality controls during pretraining;
Study how viral and attention-driven content reshapes AI learning patterns. Understanding these mechanisms allows designing models that resist degradation from exposure to junk data.

The bottom line: systematic data hygiene practices are CRUCIAL.

What Both Studies Tell Us

These papers study data quality from opposite directions but arrive at similar conclusions. The first shows how small amounts of deliberately malicious data can compromise model security. The second shows how large amounts of accidentally degraded data can compromise model reasoning.

Both problems require rethinking how we source and curate training data. Anthropic's work suggests we need better supply chain security and injection point monitoring. The brain rot research suggests we need quality filters beyond basic fact-checking.

Neither problem gets solved by simply adding more data. Scale doesn't dilute poisoned content the way we assumed. Scale also doesn't wash out the cognitive patterns learned from viral engagement-bait.

Building reliable AI systems means controlling not just what models learn, but how they learn to think. Data provenance, information source quality, and cognitive patterns embedded in training matters.

Let me know if you want to discuss how either of these issues might affect your business and AI strategy.

Until next time,
Ryan

Now, time for this week’s AI-generated image and the prompt I used to create it:

Create an image of me posing for a picture with Drew and Alex from the Chainsmokers. We are all smiling, and there is a photographer taking our photo in front of a back drop. The backdrop is a step and repeat that has the Mission logo on it. I've attached a photo of myself and the mission logo for your reference.

ChatGPT Image Oct 22, 2025, 10_01_11 AM