Context Engineering Strategies

source

Context engineering strategies represent practical techniques for managing information flow to AI agents. While Context Engineering provides the conceptual framework, these strategies offer concrete implementations for addressing the failure modes described in How Contexts Fail. Each strategy targets specific failure modes - Context Poisoning, Context Distraction, Context Confusion, and Context Clash - through deliberate information architecture. LangChain’s “How to Fix Your Context” repository catalogs proven patterns with working code.

The Five Core Strategies

Context engineering organizes around five fundamental approaches, each addressing different aspects of context management:

graph TB
    A[Context Engineering Strategies] --> B[Offloading Context]
    A --> C[Reducing Context]
    A --> D[Retrieving Context]
    A --> E[Isolating Context]
    A --> F[Caching Context]

    B --> B1[External Storage<br/>File Systems, DBs]
    C --> C1[Pruning & Summarization<br/>Remove Irrelevance]
    D --> D1[RAG & Tool Retrieval<br/>Dynamic Fetch]
    E --> E1[Multi-Agent Isolation<br/>Prevent Contamination]
    F --> F1[KV-Cache Optimization<br/>Reuse Computation]

    style A fill:#e1f5ff
    style B fill:#fff4e1
    style C fill:#ffe1f5
    style D fill:#f5e1ff
    style E fill:#e1ffe1
    style F fill:#ffe1e1

Strategy Comparison: Real-World Systems

Different systems combine these strategies based on their specific challenges. Analysis of five major implementations reveals distinct patterns:

Strategy	Drew’s Post	Manus	Anthropic Research	Cognition	open-deep-research
Offloading	Recommended	notes.md, plan.md, errors.md, context.md	Not primary focus	File system memory	Worker summaries to orchestrator
Reducing	Summarization emphasis	Selective context loading	Not mentioned	Not mentioned	Multi-stage compression
Retrieving	RAG with re-ranking	Tool descriptions, project context	Not primary focus	Dynamic knowledge retrieval	Search tool integration
Isolating	Multi-agent suggestion	Single agent, segmented memory	Orchestrator-worker pattern	Context quarantine	Parallel worker agents
Caching	Not discussed	KV-cache design emphasis	Not mentioned	Not mentioned	Heterogeneous model strategy

Drew Breunig’s Post identifies failure modes and suggests solutions, emphasizing retrieval with re-ranking and multi-agent isolation to prevent these failure modes from degrading model performance.

Manus implements comprehensive offloading through structured files while emphasizing KV-Cache Optimization. Their approach prioritizes stable tool availability and append-only contexts to maximize cache hit rates.

Anthropic’s Research System focuses on isolation through Orchestrator-Worker Pattern. Multiple agents with isolated contexts achieved 90% improvement over single-agent baselines, demonstrating isolation’s power for complex research.

Cognition (per Karpathy) emphasizes context quarantine and dynamic retrieval, treating context as potentially hostile environment requiring careful management.

open-deep-research combines isolation (parallel workers), reduction (multi-stage compression), and retrieval (search integration) with heterogeneous model strategy that optimizes cost-performance at each processing stage. See Open Deep Research for implementation details.

The pattern suggests no single strategy dominates. Effective systems combine multiple approaches, selecting strategies that address their specific failure modes and operational constraints. The art lies in understanding which failures threaten your system and choosing compatible strategies that work together rather than creating new tensions.

Retrieving Context: RAG and Dynamic Fetching

Retrieving Context represents a fundamental shift from static to dynamic context. Rather than preloading all potentially relevant information, systems fetch content on-demand based on current needs. This selective approach addresses Context Rot by keeping contexts bounded while providing access to vast knowledge bases.

The core pattern uses semantic search over vectorized documents, retrieving only high-similarity matches. Advanced implementations employ re-ranking to filter candidates for true relevance, minimizing false positives that waste context space. Better to retrieve three highly relevant documents than twenty marginally relevant ones.

Tool Retrieval applies similar patterns, semantically filtering which tool definitions reach the model based on query relevance. This prevents Context Confusion from overwhelming the model with options.

Isolating Context: Multi-Agent Quarantine

Isolating Context prevents cross-contamination by segregating different concerns into separate context windows, mirroring process isolation in operating systems.

Multi-Agent Research Systems implement this through the orchestrator-worker pattern. Each worker explores within a dedicated context while the orchestrator maintains separate coordination context. Failures in one worker’s context can’t poison other workers’ reasoning. This enabled Anthropic’s research system to achieve 90% improvement over single-agent baselines.

Even single-agent systems can employ quarantine by segmenting contexts temporally - fresh contexts for distinct reasoning phases prevent earlier failures from cascading forward. The tradeoff involves coordination overhead and potential information loss between quarantined contexts.

Reducing Context: Pruning and Summarization

Reducing Context encompasses strategies that minimize context size through aggressive filtering and compression. Unlike Offloading Context which relocates information externally, reducing strategies discard or compress content to maintain lean, focused contexts.

A practical implementation uses a smaller, faster model as a pruning filter, extracting only information relevant to current objectives. The two-model architecture trades filter costs against improved primary model performance.

Progressive Summarization maintains multiple resolution levels - recent exchanges remain verbatim while older content gets increasingly compressed (full fidelity → sentence summaries → paragraph summaries → high-level overview). This mirrors human memory’s recency bias while preventing context distraction.

The risk lies in lossy compression. Summaries inevitably lose nuance, potentially discarding information that becomes relevant later. Poor summarization can inadvertently amplify Context Confusion by preserving irrelevant details while discarding critical information.

Offloading Context: External Memory Systems

Offloading Context stores information in external systems - file systems, databases, vector stores - rather than cramming everything into the immediate context window. This extends available memory beyond the Context Window’s token limits while keeping active context focused.

The Manus team’s implementation treats the file system as extended memory, maintaining structured files (notes.md, plan.md, errors.md, context.md) that persist outside the context window. The agent reads files into context as needed, then offloads by writing updated versions - creating effectively unlimited memory bounded by storage rather than tokens.

The key principle is separating “warm” context (immediately needed, stays in window) from “cold” context (potentially needed, stored externally), mirroring how operating systems manage memory hierarchies.

Caching Context: KV-Cache Optimization

Caching Context improves performance and reduces cost by reusing previously computed Attention mechanism states. When prompt prefixes remain stable across requests, models cache key-value pairs from earlier attention computations rather than recomputing.

Effective caching requires specific patterns: stable prefixes (invariant content first), append-only context (modification breaks cache), and explicit cache breakpoints. Well-cached systems can be 10x faster and dramatically cheaper than constantly recomputing attention over full context.

Strategic Attention Manipulation

Beyond mechanical context management, strategic techniques guide model attention:

Recency Positioning: Place critical information at context boundaries (beginning or end) where models attend most reliably. Less important content goes in the middle where “lost-in-the-middle” effects reduce attention.

Attention Anchors: Create explicit artifacts like todo.md that recite objectives near the end of context, pushing goals into the model’s recent attention span.

Controlled Variation: Introduce variation in repeated patterns to prevent overfitting to conversation history. This addresses Context Rot effects where models mimic past behavior rather than reasoning freshly.

Error Preservation: Leaving failed attempts in context helps models learn what doesn’t work. Errors provide negative examples that guide future decisions.

Combining Strategies

Real systems typically combine multiple strategies, selecting compatible approaches that reinforce rather than conflict with each other:

A Multi-Agent Research System might employ:

Isolation: Each agent in quarantined context prevents cross-contamination
Retrieval: Agents fetch from shared knowledge base on-demand
Reduction: Agents compress findings for orchestrator handoff
Tool Filtering: Dynamic tool loading per agent specialization
Caching: Stable system prompts cached across all agents

An AI coding agent might use:

Offloading: Project context lives in file system memory
Pruning: Filter conversation history to relevant exchanges
Tool Filtering: Load only relevant API functions for current task
Caching: Cache stable coding guidelines and API docs
Attention Anchors: Maintain plan.md near context end

The art lies in selecting compatible strategies that address specific failure modes without creating new tensions. For instance, caching and offloading work well together - stable content gets cached while variable content lives externally. But aggressive pruning might conflict with caching if it constantly modifies supposedly stable prefixes. Context Engineering provides the framework for reasoning about these combinations systematically.

Implementation Frameworks

LangGraph Workflows provides infrastructure for implementing these patterns. The framework offers:

State management across conversation turns
Conditional routing between processing stages
Tool integration with dynamic loading
Persistence for context offloading
Streaming for progressive context assembly

This enables granular control over context flow while maintaining reasonable developer ergonomics. The graph-based structure makes context transformations explicit - you can visualize where pruning happens, where summarization occurs, and how contexts isolate or merge. Each node in the graph represents a context transformation, making the overall strategy observable and debuggable.

The framework enables composing strategies declaratively. A node might retrieve context from a vector store, pass it through a pruning filter, cache the result, then route to specialized agents with isolated contexts. The graph structure ensures these transformations happen in the correct order with proper state management between stages.

Gradual Notes

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Kubernetes Batch Jobs

Sidekiq Architecture

Sidekiq Capsules