Context Poisoning

source

Context poisoning occurs when hallucinated or erroneous information enters the Context Window and becomes reinforced through subsequent references. This failure mode represents one of the most insidious challenges in Context Engineering, as poisoned content spreads like a contaminant through downstream reasoning.

The mechanism resembles confirmation bias in human cognition. Once an agent generates false information - perhaps hallucinating a “goals” section in a planning document that doesn’t exist - each subsequent reference to that fictional content reinforces it. The model treats its own hallucination as established fact, building increasingly elaborate structures atop false foundations.

How Poisoning Spreads

An AI coding agent might hallucinate that it previously created a function called processUserData(). In the next turn, it references this non-existent function in new code. Several turns later, it “calls” the function, expecting certain behavior. The hallucination has compounded - from fictional creation to fictional usage to fictional expectations about outcomes.

The persistence creates a spiral away from reality. Traditional prompting techniques struggle to correct poisoned context because the model sees multiple references to the false content. From its perspective, the information appears well-established through repetition. Contradicting it feels like introducing new, uncertain information that conflicts with “known” facts.

This connects directly to Context Rot - as context grows longer, the model’s ability to distinguish real from hallucinated content degrades. Poisoning compounds with length, making early hallucinations particularly dangerous. They have more opportunity to spread through subsequent reasoning.

Real-World Manifestations

File System Hallucinations: Agents hallucinate files, directories, or configuration that don’t exist, then build plans assuming those resources are available. This commonly occurs in LangGraph Workflows where agents maintain mental models of project structure.

Tool Result Fabrication: An agent invents tool outputs - claiming a search returned specific results when the tool was never called. Subsequent reasoning incorporates these fictional findings, leading to completely off-track conclusions.

Memory Corruption: In systems using Offloading Context patterns, agents might hallucinate the contents of external files. They “remember” writing specific information to notes.md that never actually appeared there, then make decisions based on this false memory.

Recursive Reinforcement: The agent references its earlier hallucination, which strengthens its “belief” in the false content. Multiple references create a local context bubble where the hallucination appears more established than actual facts.

Why Poisoning is Hard to Fix

The challenge stems from how language models process context. They don’t distinguish between “information I generated” and “information that was provided to me.” Everything in context carries similar weight in the probability distribution over next tokens.

Correcting poisoned context requires intervention that overrides the accumulated weight of repeated false references. Simply stating “that’s incorrect” competes with multiple prior mentions of the false content. The model must decide between trusting one contradictory statement versus several aligned (but incorrect) earlier references.

This makes prevention far more effective than remediation. Once poisoning occurs, the context may need complete reset to fully eliminate the contamination. Partial correction often leaves residual effects where the model still treats aspects of the hallucination as plausible.

Detection Strategies

Cross-Reference Validation: Before incorporating agent-generated content into persistent context, validate it against ground truth. Check that files mentioned actually exist, tools referenced are available, and facts claimed are verifiable.

Confidence Scoring: Models that provide confidence signals about their outputs enable filtering low-confidence claims before they enter permanent context. High uncertainty correlates with potential hallucination.

Grounding Checks: Require agents to cite sources or ground claims in verifiable facts. Ungrounded assertions flag as potential poisoning candidates. Retrieving Context patterns that emphasize citation reduce poisoning risk.

Consistency Verification: Check new information against established context for contradictions. Conflicts might indicate one piece is poisoned. This requires maintaining reliable reference context separate from working context.

Mitigation Approaches

Context Quarantine: Isolating Context across multiple agents prevents poisoning in one agent’s context from contaminating others. The Multi-Agent Research Systems architecture naturally limits poisoning spread - each worker’s hallucinations stay contained to their own context bubble.

Explicit Verification Steps: Add verification nodes in LangGraph Workflows that validate agent claims before allowing them to influence future reasoning. A separate verifier agent checks factual claims against available evidence.

Conservative Context Management: Treat agent-generated content as suspect until validated. Keep it separate from trusted context sources. This mirrors computer security principles - untrusted input requires sanitization before entering trusted processing.

Structured Knowledge Bases: Store facts in structured formats where hallucinations become syntactically invalid. It’s harder to hallucinate a database entry than to fabricate free-form text. Offloading Context to structured systems reduces poisoning vectors.

The Broader Pattern

Context poisoning illuminates a fundamental challenge in AI systems that generate and consume their own outputs. Without external grounding, models can create locally consistent but globally false realities. The context window becomes an echo chamber where hallucinations reinforce themselves.

This connects to research on Unfolding with Context - how understanding develops through the structure of information presentation. Poisoned contexts unfold into systematically flawed understanding, as each inference builds on contaminated premises.

The solution space involves architectural choices about information flow. Context Engineering Strategies that emphasize validation, isolation, and structured storage all address poisoning risk. The common thread is preventing unvalidated agent outputs from entering trusted context without scrutiny.

Systems like Open Deep Research that process vast amounts of information face acute poisoning risk. Their multi-stage compression and synthesis provides multiple opportunities for hallucinations to enter. Effective mitigation requires poisoning awareness at each processing stage - research agents, summarization steps, and final synthesis all need contamination checks.

As agents become more autonomous and run longer without human oversight, poisoning prevention grows more critical. The cost of a hallucination that persists through hundreds of reasoning steps far exceeds the overhead of rigorous validation. How Contexts Fail identifies poisoning as a primary failure mode precisely because its effects cascade through subsequent processing.

The field is developing immune system metaphors for context management - treating hallucinations as infections that require detection, isolation, and elimination before they spread. This framing emphasizes that context isn’t passive information storage but an active environment requiring health monitoring.

Gradual Notes

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Kubernetes Batch Jobs

Sidekiq Architecture

Sidekiq Capsules