Agent Engineering

Agent Engineering Best Practices

Copy page

Design AI agents that are focused, safe, and reliable. Learn the principles, patterns, and techniques for building high-quality agent systems.

Agent engineering is the discipline of designing AI agents that reliably accomplish their goals without over-constraining them. The key principle: treat tokens and attention as a shared budget. Prefer small, high-signal prompts plus strong output contracts over long prose.

This section covers the full lifecycle of agent design — from choosing the right pattern, to writing effective prompts, to testing and iterating on deployed agents.

Who this is for

These best practices apply to anyone designing AI agents — whether you're building single-purpose reviewers, multi-phase workflow orchestrators, or deciding if an agent is even the right tool for the job.

Section overview

Key concepts

Agents vs skills vs rules

Not every task needs an agent. Choose the lightest mechanism that reliably achieves the goal:

MechanismUse whenWhy
Always-on rulesRepo-wide constraints, commands, conventionsApplied everywhere; no routing needed
SkillsReusable instructions in the main conversation contextReuse without context isolation
SubagentSingle specialized worker/reviewer with tool restrictionsIsolated context + least-privilege tools
Workflow orchestratorMulti-phase pipeline coordinating other agentsEncodes phase order, dispatch, aggregation, iteration

The "Goldilocks zone" of strictness

Agent prompts should hit the right altitude — specific enough to guide, general enough to generalize:

Failure modeSymptomExample
Too rigidBrittle enumeration; breaks on unexpected inputs"If .ts do X. If .tsx do Y. If .js do Z..."
Too vagueAbstract principles without concrete signals"Be helpful and thorough."
Just rightHeuristics that generalize + clear escalation"Prioritize correctness over style. When uncertain, ask."

Quick reference: strictness by task type

  • High freedom: heuristics + output contract (reviews, audits, brainstorming)
  • Medium: step sequence + required checks (refactors, migrations)
  • Low: scripts/commands + strict validation loops (fragile operations)