Research Workflow Architecture

Research Workflow Architecture represents the three-phase decomposition pattern that structures complex research tasks into distinct, optimized stages. Rather than treating research as a monolithic process, this architectural approach separates scoping, research execution, and synthesis into isolated phases with clean boundaries and explicit information flow between them.

The pattern emerged from practical observations about Context Engineering failures in end-to-end systems. When a single agent handles everything from understanding user intent through final report generation, contexts accumulate contradictory objectives - broad exploration competes with focused synthesis, verbose research history drowns out current objectives, and iterative refinement interferes with decisive execution. Phase isolation prevents these conflicts by ensuring each stage operates within contexts optimized for its specific purpose.

The Three Phases

Phase 1: Scoping transforms ambiguous user queries into structured research briefs through clarification dialogue. The scoping agent asks targeted questions to understand what aspects matter most, the user’s background knowledge level, depth versus breadth preferences, and desired output characteristics. This conversation produces a compact research brief that anchors all subsequent work. See Research Scoping Patterns for specific techniques that prevent irrelevant research by establishing clear boundaries upfront.

Phase 2: Research operates through supervisor-orchestrated decomposition where a research supervisor analyzes the brief, determines whether single-agent or multi-agent exploration makes sense, decomposes complex queries into non-overlapping subtopics, and coordinates parallel exploration. Each worker agent investigates its assigned subtopic within isolated context, producing compressed findings that flow back to the supervisor. The supervisor synthesizes these fragmented insights into coherent understanding. This delegation strategy follows principles from Research Delegation Heuristics while Research Findings Synthesis addresses the challenge of combining insights from isolated contexts.

Phase 3: Write performs one-shot report generation from the research brief plus synthesized findings. The writing agent sees only the original research brief and the supervisor’s compressed synthesis - not the full exploration history, dead-end paths, or verbose tool outputs. This clean context enables focused generation without distraction from research details.

graph TB
    User[User Query] --> Scope[Scoping Agent]
    Scope --> Brief[Research Brief]
    Brief --> Super[Research Supervisor]
    Super --> Decision{Single or Multi-Agent?}
    Decision -->|Simple| Single[Single Agent Research]
    Decision -->|Complex| Multi[Multi-Agent Decomposition]
    Multi --> W1[Worker 1: Subtopic A]
    Multi --> W2[Worker 2: Subtopic B]
    Multi --> W3[Worker 3: Subtopic C]
    W1 --> Findings1[Compressed Findings]
    W2 --> Findings2[Compressed Findings]
    W3 --> Findings3[Compressed Findings]
    Single --> FindingsS[Research Findings]
    Findings1 --> Synth[Supervisor Synthesis]
    Findings2 --> Synth
    Findings3 --> Synth
    FindingsS --> Synth
    Brief --> Write[Writing Agent]
    Synth --> Write
    Write --> Report[Final Report]

Why Phase Isolation

Optimization Independence: Each phase can use different models, prompting strategies, and computational resources without affecting others. Scoping might use a conversational model optimized for dialogue. Research might employ multiple specialized models for search and synthesis. Writing could use a model tuned for coherent long-form generation. This heterogeneous architecture optimizes resource allocation per-phase rather than compromising on one-size-fits-all choices.

Context Management: Isolation prevents context contamination where one phase’s verbose outputs pollute another’s focused workspace. Research exploration generates massive context - search results, tool outputs, reasoning traces. If this accumulated in a shared context with scoping and writing, it would create Context Distraction and Context Confusion. Phase boundaries act as filters that pass only essential information forward.

Clarity of Purpose: When an agent handles multiple objectives simultaneously, priorities conflict. Should it explore more deeply or wrap up and write? Dive into interesting tangents or stay focused? Phase isolation removes these ambiguities. Research phase agents maximize understanding without worrying about final presentation. Writing phase agents optimize communication without needing research capabilities.

Information Flow Between Phases

The architecture enforces explicit information contracts at phase boundaries:

Scoping → Research: Research brief containing focused query, scope boundaries, depth preferences, background assumptions. This compact artifact (~500-1000 tokens) provides sufficient context for research without carrying conversation history.

Research → Writing: Synthesized findings from supervisor plus the original brief. Workers’ full exploration contexts don’t propagate - only compressed insights deemed essential by the supervisor. This creates a natural information bottleneck that prevents research verbosity from overwhelming writing.

The boundaries mirror Reducing Context strategies applied architecturally. Rather than trying to compress one massive context, the system produces small contexts by design through phase isolation.

Tradeoffs and Limitations

Efficiency Gains: Phase isolation enables massive parallelization. Multiple worker agents research simultaneously during Phase 2. If writing requires revision based on research gaps, the system must cycle back rather than iteratively refining. This trades iterative improvement for parallel throughput.

Iterative Refinement: Traditional research involves cycles - discover something, refine understanding, explore adjacent areas, discover more. Three-phase architecture limits this iteration. Research happens once, then writing happens once. Complex research questions might benefit from iterative deepening that this structure prevents. The system optimizes for decisive execution over exploratory refinement.

Context Complexity: Phase boundaries create coordination overhead. Information must be explicitly packaged for handoff. The supervisor must synthesize worker findings without seeing their full reasoning. The writer must generate reports from compressed synthesis without accessing source research. These compressions risk losing nuance. The architecture bets that preventing Context Rot through isolation outweighs information loss at boundaries.

Reversibility Constraints: Once research completes, returning for additional exploration requires restarting Phase 2. Unlike systems with persistent research context, this architecture treats phases as one-way progressions. This aligns with Reversible Decisions philosophy - the decision to complete research and move to writing should be high-confidence since reversing it is expensive.

Integration with Broader Patterns

The architecture implements the Orchestrator-Worker Pattern at the research phase, where the supervisor orchestrates isolated worker contexts. This enables Multi-Agent Research Systems to scale without context explosion - five workers with 50k token contexts remain manageable while a single 250k token context would suffer severe degradation.

Context Engineering principles permeate the design. Offloading Context stores research artifacts externally. Reducing Context compresses findings at supervisor level. Caching Context optimizes within phases where prompts share stable prefixes. Retrieving Context powers research agents’ tool use without loading everything into context.

The pattern reflects insights from OODA Loop - observe (scoping), orient (research), decide (synthesis), act (write). Each phase maps to OODA stages, creating explicit boundaries where previous stages inform next stages through controlled information flow.

Relationship to Human Research

The architecture mirrors how humans decompose complex research tasks. We don’t simultaneously scope questions, gather information, and write conclusions. We clarify what we’re investigating, explore the problem space, synthesize understanding, then communicate findings. The phases reflect natural cognitive boundaries between different types of work.

This connects to Human Metacognition - awareness and management of our thought processes. Just as effective researchers consciously separate exploration from synthesis from communication, this architecture enforces those separations structurally. The system can’t accidentally confuse research context with writing context because they’re architecturally isolated.

Evolution and Future Directions

Current implementations use fixed three-phase progression. Future systems might employ:

Adaptive Phase Selection: Determining at runtime whether a query needs full three-phase treatment or can skip directly to writing from brief
Iterative Deepening: Controlled cycles where writing phase can request additional research on specific topics, triggering targeted Phase 2 execution
Nested Research: Complex subtopics during Phase 2 might trigger their own three-phase workflows, creating recursive research structures
Continuous Learning: Phases that learn from previous executions, improving scoping questions, delegation strategies, and synthesis approaches over time

The fundamental insight - separate concerns through phase isolation to optimize each independently - will likely persist even as implementations grow more sophisticated. The architecture demonstrates that managing AI agent complexity often means enforcing simpler boundaries rather than building more complex unified systems.

Gradual Notes

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Kubernetes Batch Jobs

Sidekiq Architecture

Sidekiq Capsules