Prompt engineering is the practice of carefully designing inputs to language models to elicit desired outputs. Unlike traditional programming where we write explicit instructions in code, prompt engineering involves crafting natural language instructions that guide model behavior.
Core Principles
Clarity and Specificity: Vague prompts produce vague outputs. “Write about dogs” yields generic content while “Write a 200-word explanation of how dogs use scent marking for territorial communication” produces focused, useful results.
Context Provision: Models work better when given relevant background. Rather than “Summarize this,” provide context: “Summarize this research paper for an audience of undergraduate students unfamiliar with the field.”
Examples and Demonstrations: Few-shot learning works remarkably well. Show the model 2-3 examples of desired output format and quality, then ask for similar treatment of new content.
Iterative Refinement: First attempts rarely produce optimal results. Treat prompt development as an iterative process - test, analyze failures, refine, repeat.
Techniques
Progressive Narrowing
Start broad, then systematically narrow focus based on intermediate results. In Multi-Agent Research Systems, this prevents premature optimization while ensuring comprehensive exploration.
Example progression:
- “Research the impact of social media on mental health”
- “Focus on studies examining anxiety and depression specifically”
- “Compare findings across age groups, particularly adolescents vs adults”
Chain-of-Thought Prompting
Encourage the model to show its reasoning process. “Let’s think step by step” or “Explain your reasoning” produces more reliable outputs for complex tasks.
This transparency helps debug unexpected behaviors and understand how models decompose problems. The approach reveals the model’s implicit reasoning, making it easier to identify where logic diverges from expectations.
Task Decomposition
Break complex requests into smaller, well-defined subtasks. Rather than “Build me a complete web application,” decompose into:
- “Design the database schema”
- “Write API endpoint specifications”
- “Implement authentication logic”
- “Create frontend components”
This mirrors how The Lego Approach for Building Agentic Systems composes complex workflows from simpler components.
Role Assignment
Assign the model a specific role or expertise. “You are an expert data scientist reviewing this analysis” produces different outputs than “You are a business executive evaluating ROI.”
Roles provide implicit context about perspective, priorities, and domain knowledge the model should apply.
Constraint Specification
Explicitly state constraints: word limits, formatting requirements, what to include or exclude, tone and style preferences. Models follow explicit constraints more reliably than inferring unstated expectations.
Advanced Patterns
Meta-Prompting
Use the model to help design better prompts. Ask it to suggest improvements to your prompts or to generate prompts for specific tasks. This leverages the model’s understanding of how it responds to different prompt structures.
Prompt Chaining
Link multiple prompts where each uses outputs from previous steps. The first prompt might gather information, the second analyzes it, and the third synthesizes conclusions.
In Multi-Agent Research Systems, specialized agents receive carefully engineered prompts defining their specific roles and responsibilities within the broader research workflow.
Negative Prompting
Explicitly state what NOT to do. “Do not include speculation” or “Avoid technical jargon” can be as important as stating what TO include.
Output Formatting
Specify desired output structure. “Respond in JSON format with keys X, Y, Z” or “Use markdown with H2 headers for main sections” makes outputs more parseable and useful for downstream processing.
Context-Specific Considerations
For Multi-Agent Systems
Prompts must define:
- Clear task boundaries: What is and isn’t this agent’s responsibility
- Coordination protocols: How agents share information and defer to each other
- Quality criteria: What constitutes successful completion
- Error handling: How to behave when encountering problems
See Orchestrator-Worker Pattern for coordination details.
For Code Generation
Effective prompts specify:
- Language and framework versions
- Coding standards and style preferences
- Error handling requirements
- Performance considerations
- Testing expectations
For Research Tasks
Research prompts benefit from:
- Defining information sources (academic papers, recent news, technical documentation)
- Specifying depth vs. breadth tradeoffs
- Requesting citations and evidence
- Indicating appropriate level of technical detail
Evaluation and Iteration
Small Sample Testing: Test prompts on diverse examples before scaling. Edge cases often reveal weaknesses invisible in happy-path scenarios.
LLM-as-Judge: Use LLM-as-Judge evaluation to assess prompt effectiveness at scale. The judge evaluates whether outputs meet quality criteria.
A/B Comparison: Compare outputs from different prompt variations. Sometimes small wording changes produce dramatically different results.
User Feedback Integration: Incorporate real usage patterns and failure modes into prompt refinement. Theoretical prompt quality matters less than practical performance.
Common Pitfalls
Over-Specification: Extremely detailed prompts can constrain the model too much, preventing it from applying its broader knowledge effectively.
Ambiguous Instructions: “Make it better” or “improve this” give the model no actionable direction. Specify what dimensions of “better” matter.
Assuming Capabilities: Models have limitations. Prompting won’t overcome fundamental capability gaps, though it can better leverage existing capabilities.
Ignoring Context Window: Extremely long prompts leave less space for responses. In context window constrained scenarios, concise prompts become essential. Context Engineering addresses this systematically through strategies like pruning, summarization, and offloading.
Static Prompts for Dynamic Tasks: Tasks with varying requirements need adaptive prompting strategies, not one-size-fits-all instructions.
The Art and Science
Prompt engineering blends systematic experimentation with intuitive understanding of how language models interpret instructions. It requires:
- Analytical thinking to decompose complex tasks
- Linguistic precision to express requirements clearly
- Iterative experimentation to discover what works
- Domain expertise to provide appropriate context
As models evolve, effective prompting strategies shift. What works brilliantly with one model may underperform with another. This makes prompt engineering an ongoing practice rather than a solved problem.
Research Agent Prompting
Deep Research Systems require specialized prompting patterns discovered through Open Deep Research implementation and Anthropic’s engineering.
Supervisor Delegation Templates: Research Delegation Heuristics implement through structured prompts that guide high-level reasoning. Template structure:
You are a research supervisor coordinating specialized agents.
Research Brief:
{brief_from_scoping}
Current Understanding:
{existing_findings}
Task: Decide research strategy
- Use single agent for: simple facts, lists, rankings
- Use multiple agents for: comparative analysis, multi-dimensional topics
- Max {max_concurrent} agents per iteration
- Max {max_iterations} total iterations
After each research round, reflect:
1. Do findings sufficiently address the brief?
2. Are there gaps requiring deeper exploration?
3. Should we proceed to synthesis or continue research?
Reasoning:
This template structure ensures supervisors consider Progressive Research Exploration principles while respecting resource constraints.
Sub-Agent Research Prompts: Workers receive standalone instructions: (1) specific research question (from supervisor decomposition), (2) search tools available, (3) quality criteria (depth, sources, evidence), (4) compression requirement (produce focused summary, not raw results). Example:
Research Question: {subtopic_from_supervisor}
Your Task:
- Use search tools to gather information addressing this question
- Evaluate source quality and relevance
- Extract key findings with evidence
- Produce focused summary (not raw search results)
Quality Criteria:
- Cite specific sources
- Distinguish facts from interpretations
- Note confidence levels and uncertainties
- Keep summary under {max_length} tokens
Available Tools:
{tool_descriptions}
Reflection Prompt Design: ReAct Agent Pattern includes reflection after each observation. Reflection prompts guide productive thinking: “Based on this search result: (1) What did I learn? (2) What gaps remain? (3) Should I search for more information or synthesize current findings? (4) If searching more, what specific query would help?” This structured reflection prevents aimless exploration while encouraging adaptive strategy.
Quality Criteria Specification: Research prompts must specify quality dimensions explicitly rather than assuming “good research.” Useful criteria: (1) Source diversity - information from multiple independent sources, (2) Evidence specificity - concrete facts and citations vs generalizations, (3) Uncertainty acknowledgment - distinguishing confident claims from speculative, (4) Relevance filtering - staying focused on research question vs tangents. These criteria appear in both worker prompts and LLM-as-Judge evaluation.
Few-Shot Examples for Research Tasks: Research prompts benefit from few-shot demonstrations showing: (1) good vs poor search queries, (2) effective vs ineffective summarization, (3) appropriate vs inappropriate source citation. Example demonstrations calibrate agent behavior more reliably than abstract instructions. The examples encode tacit knowledge about research quality that’s difficult to specify explicitly.