ReAct Agent Pattern

source

ReAct stands for Reasoning and Acting - a synergistic loop where agents interleave explicit reasoning with tool-based actions, using observations from each action to inform subsequent reasoning. The pattern emerged from recognizing that neither pure planning (reason completely then act) nor reactive tool-calling (act without reasoning) provides sufficient autonomy for complex tasks requiring exploration and adaptation.

The ReAct Cycle

The pattern operates through a continuous loop with three distinct phases:

Thought: The agent articulates what information it needs and why. This isn’t just task planning but active reasoning about the current state - what’s known, what’s missing, what would be most valuable to discover next. The thought phase makes reasoning explicit, creating a trace that becomes part of the agent’s Context Window and influences future decisions.

Action: Based on reasoning, the agent selects and executes a tool. This might be searching documentation, calling an API, querying a database, or any capability exposed through the agent’s tool interface. The action represents a deliberate information-gathering step driven by the preceding thought.

Observation: The agent processes tool results, extracting relevant information and assessing what was learned. This observation feeds directly into the next thought phase, creating the loop. Crucially, observations can reveal unexpected findings that cause the agent to revise its approach - this adaptability distinguishes ReAct from rigid predetermined plans.

The cycle repeats until the agent determines it has sufficient information to complete its task, hits an iteration limit, or explicitly decides to conclude based on reasoning about progress.

Why ReAct Works for Research

Research tasks naturally fit the ReAct pattern because they involve exploration under uncertainty. An agent researching a topic doesn’t know in advance what information sources will prove valuable or what findings will emerge. The ReAct loop provides a framework for systematic exploration that adapts based on discoveries.

Consider an agent researching “quantum computing applications in cryptography”:

Thought: “I need to understand current quantum computing capabilities before exploring cryptographic applications.”

Action: Search for “quantum computing state of the art 2025”

Observation: Results indicate quantum computers now have 1000+ qubits with improved error correction. This is significant for breaking classical encryption.

Thought: “Given quantum capabilities, I should focus on which cryptographic algorithms are vulnerable and what post-quantum alternatives exist.”

Action: Search for “post-quantum cryptography standards”

Each cycle builds understanding that informs the next exploration. The agent doesn’t need to plan the entire research path upfront - it discovers the path through iterative investigation.

Integration with Tool-Calling

The action phase of ReAct integrates with language models’ native tool-calling capabilities. Modern models can be prompted to select tools, specify parameters, and interpret results within the ReAct framework. The key difference from simple tool-calling is the explicit reasoning wrapper around each tool use.

Without ReAct, a model might call search tools when prompted but without articulating why that search would help or how results inform the larger task. ReAct makes this reasoning explicit:

Thought: The user wants to know about X. I should search for Y because it will reveal Z.
Action: search(query="Y")
Observation: Found information showing Z is actually related to W.
Thought: Interesting, I didn't expect the W connection. Let me explore that.
Action: search(query="W relationship to X")

This explicit reasoning creates a richer context for decision making compared to implicit tool selection. The model can reference its own past reasoning when making subsequent choices, creating continuity beyond just accumulated observations.

Reflection and Learning

After each action-observation cycle, effective ReAct implementations include a reflection step where the agent evaluates what was learned and whether the research direction should change. This meta-reasoning about reasoning helps prevent agents from mechanically following initial plans when observations suggest different approaches.

Reflection might consider:

Did this observation answer the question I thought it would?
What unexpected information emerged?
Does this change my understanding of what’s important?
Should I continue this direction or pivot to something else?

This connects to Progressive Research Exploration’s emphasis on adaptive navigation based on findings. ReAct provides the mechanism for that adaptation through reflection on observations.

Comparison to Alternatives

Chain-of-Thought (CoT): CoT generates reasoning traces but doesn’t interleave with actions. The model thinks through the entire problem, then acts. This works for problems with known solution paths but fails when exploration must inform the approach. ReAct’s advantage is learning from actions to inform subsequent reasoning.

Pure Planning: Some agent architectures create complete plans before execution. ReAct rejects this in favor of iterative planning-acting cycles. Pure planning assumes enough is known upfront to create effective plans, while ReAct assumes the agent must discover what’s needed through exploration.

Reactive Tool-Calling: Simply calling tools when needed without explicit reasoning creates brittle agents. They might use tools effectively for straightforward tasks but lack the reasoning framework to handle complex, multi-step problems requiring strategic information gathering.

The tradeoffs center on flexibility versus efficiency. ReAct’s iterative adaptation provides flexibility to handle uncertain tasks but uses more tokens and inference cycles compared to approaches that reason once then act. For research tasks where the information landscape is unknown, this flexibility proves essential.

Implementation in Practice

LangChain’s deep research agents implement ReAct through structured prompts that guide the model through explicit reasoning phases:

You are a research agent with access to search tools.

For each step:
1. Thought: Articulate what you need to discover and why
2. Action: Use a tool to gather information
3. Observation: Process what you learned
4. Repeat until you have sufficient information

Then provide your research findings.

The framework handles tool execution and observation injection, while the model generates thoughts and selects actions. This separation keeps the ReAct pattern clean - the model focuses on reasoning and tool selection, the framework handles execution mechanics.

The Lego Approach for Building Agentic Systems treats ReAct agents as composable units. Each agent gets clear instructions about its research scope, access to relevant tools, and autonomy to execute ReAct loops within that scope. The orchestrating system coordinates multiple ReAct agents without managing their individual reasoning cycles.

Integration with Context Engineering

ReAct patterns significantly impact Context Engineering because each cycle adds content to the Context Window. Thoughts, actions, and observations accumulate rapidly - a research agent making 10 search calls generates thought-action-observation triplets for each, potentially consuming tens of thousands of tokens.

This creates pressure for Reducing Context through compression strategies. The Research Compression Pipeline often processes ReAct agent outputs, distilling lengthy thought-action-observation histories into focused findings summaries. The compression preserves conclusions while discarding the verbose reasoning trace that led there.

Isolating Context becomes important when multiple ReAct agents operate in parallel. Each agent’s thought-action-observation history stays isolated to its own context, preventing other agents from experiencing Context Distraction from verbose reasoning traces irrelevant to their subtopics.

The pattern also relates to Prompt Engineering strategies around extended thinking. Making thoughts explicit helps models reason more effectively, similar to how CoT prompting improves performance. But ReAct extends this by making thoughts actionable - they directly inform tool selection and observation interpretation.

Failure Modes and Mitigations

Reasoning Loops: Agents can get stuck in repetitive thought patterns, making similar searches without progress. Mitigation involves iteration limits and prompt instructions to recognize when a direction isn’t productive.

Observation Overload: Tool results can be verbose, filling context with detail that obscures signal. Mitigation uses summarization - have agents explicitly extract key findings from observations rather than keeping full tool outputs.

Premature Conclusion: Agents might stop exploring before gathering sufficient information. Mitigation includes explicit prompts to verify completeness and requirements for minimum exploration depth.

Lost Coherence: Long ReAct sequences can lose track of the original objective as context fills with intermediate observations. Mitigation involves periodic reminders of the goal and checkpoints where agents assess overall progress versus the objective.

Relationship to Decision Frameworks

The ReAct pattern implements the OODA Loop at the agent level:

Observe: The observation phase where tool results are processed
Orient: The thought phase where observations are interpreted within the agent’s understanding
Decide: The transition from thought to action where the agent selects a tool
Act: The action phase executing the chosen tool

This mapping reveals ReAct as a specific instantiation of OODA principles for AI agents. The cycle creates the same adaptive advantage Boyd identified in military strategy - agents that cycle through ReAct faster and more effectively than alternatives can outmaneuver rigid approaches.

The pattern also connects to Agency concepts, particularly mechanistic agency. ReAct agents exhibit functional capacity to take action based on information, processing representational states (observations) against desired states (research objectives) to select appropriate actions. The explicit reasoning makes this agency transparent rather than black-box.

Evolution and Variants

The basic ReAct pattern has spawned variants optimizing for specific use cases:

Parallel ReAct: Multiple agents run ReAct loops simultaneously on different subtopics, with Multi-Agent Research Systems coordinating their efforts.

Hierarchical ReAct: Agents can spawn sub-agents to explore specific questions that emerge during research, creating nested ReAct loops at different granularities.

Bounded ReAct: Hard limits on iterations, tool calls, or token usage ensure ReAct loops converge within resource constraints.

Cached ReAct: Prompt caching optimizes repeated ReAct cycles by reusing thought patterns and tool definitions across iterations.

These variants maintain the core thought-action-observation loop while adapting the pattern to different architectural and performance requirements.

Practical Considerations

Token Budget: ReAct consumes tokens rapidly through verbose reasoning traces. Budget for 3-5x more tokens than the final compressed output.

Tool Design: Tools should return information agents can effectively process. Unstructured, verbose outputs force agents to spend reasoning capacity on extraction rather than actual research.

Instruction Clarity: Agent prompts must clearly define when to continue exploring versus when sufficient information has been gathered. Ambiguous completion criteria cause either premature stopping or runaway iteration.

Error Handling: Tools can fail or return unexpected results. Effective ReAct agents reason about tool failures and adapt rather than mechanically continuing with broken information.

The pattern represents a fundamental shift in how we build autonomous agents - from predetermined algorithms to adaptive systems that reason about what they discover and adjust their approach accordingly. This adaptability proves essential for research tasks where the information landscape reveals itself only through exploration.

Gradual Notes

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Kubernetes Batch Jobs

Sidekiq Architecture

Sidekiq Capsules