Prompt Engineering

Prompt engineering is the practice of carefully designing inputs to language models to elicit desired outputs. Unlike traditional programming where we write explicit instructions in code, prompt engineering involves crafting natural language instructions that guide model behavior.

Core Principles

Clarity and Specificity: Vague prompts produce vague outputs. “Write about dogs” yields generic content while “Write a 200-word explanation of how dogs use scent marking for territorial communication” produces focused, useful results.

Context Provision: Models work better when given relevant background. Rather than “Summarize this,” provide context: “Summarize this research paper for an audience of undergraduate students unfamiliar with the field.”

Examples and Demonstrations: Few-shot learning works remarkably well. Show the model 2-3 examples of desired output format and quality, then ask for similar treatment of new content.

Iterative Refinement: First attempts rarely produce optimal results. Treat prompt development as an iterative process - test, analyze failures, refine, repeat.

Techniques

Progressive Narrowing

Start broad, then systematically narrow focus based on intermediate results. In Multi-Agent Research Systems, this prevents premature optimization while ensuring comprehensive exploration.

Example progression:

“Research the impact of social media on mental health”
“Focus on studies examining anxiety and depression specifically”
“Compare findings across age groups, particularly adolescents vs adults”

Chain-of-Thought Prompting

Encourage the model to show its reasoning process. “Let’s think step by step” or “Explain your reasoning” produces more reliable outputs for complex tasks.

This transparency helps debug unexpected behaviors and understand how models decompose problems. The approach reveals the model’s implicit reasoning, making it easier to identify where logic diverges from expectations.

Task Decomposition

Break complex requests into smaller, well-defined subtasks. Rather than “Build me a complete web application,” decompose into:

“Design the database schema”
“Write API endpoint specifications”
“Implement authentication logic”
“Create frontend components”

This mirrors how The Lego Approach for Building Agentic Systems composes complex workflows from simpler components.

Role Assignment

Assign the model a specific role or expertise. “You are an expert data scientist reviewing this analysis” produces different outputs than “You are a business executive evaluating ROI.”

Roles provide implicit context about perspective, priorities, and domain knowledge the model should apply.

Constraint Specification

Explicitly state constraints: word limits, formatting requirements, what to include or exclude, tone and style preferences. Models follow explicit constraints more reliably than inferring unstated expectations.

Advanced Patterns

Meta-Prompting

Use the model to help design better prompts. Ask it to suggest improvements to your prompts or to generate prompts for specific tasks. This leverages the model’s understanding of how it responds to different prompt structures.

Prompt Chaining

Link multiple prompts where each uses outputs from previous steps. The first prompt might gather information, the second analyzes it, and the third synthesizes conclusions.

In Multi-Agent Research Systems, specialized agents receive carefully engineered prompts defining their specific roles and responsibilities within the broader research workflow.

Negative Prompting

Explicitly state what NOT to do. “Do not include speculation” or “Avoid technical jargon” can be as important as stating what TO include.

Output Formatting

Specify desired output structure. “Respond in JSON format with keys X, Y, Z” or “Use markdown with H2 headers for main sections” makes outputs more parseable and useful for downstream processing.

Context-Specific Considerations

For Multi-Agent Systems

Prompts must define:

Clear task boundaries: What is and isn’t this agent’s responsibility
Coordination protocols: How agents share information and defer to each other
Quality criteria: What constitutes successful completion
Error handling: How to behave when encountering problems

See Orchestrator-Worker Pattern for coordination details.

For Code Generation

Effective prompts specify:

Language and framework versions
Coding standards and style preferences
Error handling requirements
Performance considerations
Testing expectations

For Research Tasks

Research prompts benefit from:

Defining information sources (academic papers, recent news, technical documentation)
Specifying depth vs. breadth tradeoffs
Requesting citations and evidence
Indicating appropriate level of technical detail

Evaluation and Iteration

Small Sample Testing: Test prompts on diverse examples before scaling. Edge cases often reveal weaknesses invisible in happy-path scenarios.

LLM-as-Judge: Use LLM-as-Judge evaluation to assess prompt effectiveness at scale. The judge evaluates whether outputs meet quality criteria.

A/B Comparison: Compare outputs from different prompt variations. Sometimes small wording changes produce dramatically different results.

User Feedback Integration: Incorporate real usage patterns and failure modes into prompt refinement. Theoretical prompt quality matters less than practical performance.

Common Pitfalls

Over-Specification: Extremely detailed prompts can constrain the model too much, preventing it from applying its broader knowledge effectively.

Ambiguous Instructions: “Make it better” or “improve this” give the model no actionable direction. Specify what dimensions of “better” matter.

Assuming Capabilities: Models have limitations. Prompting won’t overcome fundamental capability gaps, though it can better leverage existing capabilities.

Ignoring Context Window: Extremely long prompts leave less space for responses. In context window constrained scenarios, concise prompts become essential. Context Engineering addresses this systematically through strategies like pruning, summarization, and offloading.

Static Prompts for Dynamic Tasks: Tasks with varying requirements need adaptive prompting strategies, not one-size-fits-all instructions.

The Art and Science

Prompt engineering blends systematic experimentation with intuitive understanding of how language models interpret instructions. It requires:

Analytical thinking to decompose complex tasks
Linguistic precision to express requirements clearly
Iterative experimentation to discover what works
Domain expertise to provide appropriate context

As models evolve, effective prompting strategies shift. What works brilliantly with one model may underperform with another. This makes prompt engineering an ongoing practice rather than a solved problem.

Research Agent Prompting

Deep Research Systems require specialized prompting patterns discovered through Open Deep Research implementation and Anthropic’s engineering.

Supervisor Delegation Templates: Research Delegation Heuristics implement through structured prompts that guide high-level reasoning. Template structure:

You are a research supervisor coordinating specialized agents.

Research Brief:
{brief_from_scoping}

Current Understanding:
{existing_findings}

Task: Decide research strategy
- Use single agent for: simple facts, lists, rankings
- Use multiple agents for: comparative analysis, multi-dimensional topics
- Max {max_concurrent} agents per iteration
- Max {max_iterations} total iterations

After each research round, reflect:
1. Do findings sufficiently address the brief?
2. Are there gaps requiring deeper exploration?
3. Should we proceed to synthesis or continue research?

Reasoning:

This template structure ensures supervisors consider Progressive Research Exploration principles while respecting resource constraints.

Sub-Agent Research Prompts: Workers receive standalone instructions: (1) specific research question (from supervisor decomposition), (2) search tools available, (3) quality criteria (depth, sources, evidence), (4) compression requirement (produce focused summary, not raw results). Example:

Research Question: {subtopic_from_supervisor}

Your Task:
- Use search tools to gather information addressing this question
- Evaluate source quality and relevance
- Extract key findings with evidence
- Produce focused summary (not raw search results)

Quality Criteria:
- Cite specific sources
- Distinguish facts from interpretations
- Note confidence levels and uncertainties
- Keep summary under {max_length} tokens

Available Tools:
{tool_descriptions}

Reflection Prompt Design: ReAct Agent Pattern includes reflection after each observation. Reflection prompts guide productive thinking: “Based on this search result: (1) What did I learn? (2) What gaps remain? (3) Should I search for more information or synthesize current findings? (4) If searching more, what specific query would help?” This structured reflection prevents aimless exploration while encouraging adaptive strategy.

Quality Criteria Specification: Research prompts must specify quality dimensions explicitly rather than assuming “good research.” Useful criteria: (1) Source diversity - information from multiple independent sources, (2) Evidence specificity - concrete facts and citations vs generalizations, (3) Uncertainty acknowledgment - distinguishing confident claims from speculative, (4) Relevance filtering - staying focused on research question vs tangents. These criteria appear in both worker prompts and LLM-as-Judge evaluation.

Few-Shot Examples for Research Tasks: Research prompts benefit from few-shot demonstrations showing: (1) good vs poor search queries, (2) effective vs ineffective summarization, (3) appropriate vs inappropriate source citation. Example demonstrations calibrate agent behavior more reliably than abstract instructions. The examples encode tacit knowledge about research quality that’s difficult to specify explicitly.

Recent Writing

Recent Notes