LLM Failure Mode Catalog

A catalog of systematic LLM failure modes and concrete techniques to guard against them in agent prompts. Select the 3-5 most relevant for your agent type.

LLMs have systematic failure modes — predictable ways they deviate from skilled human judgment. Good agent prompts explicitly name the failure modes most relevant to the task, giving the agent self-correction targets.

This is the difference between "what to do" and "what to watch out for."

How to use this catalog

Review the failure modes below

Identify the 3-5 most likely given your agent's task and scope

Include them in the agent prompt — either as a dedicated section or woven into operating principles

Frame as specific, observable behaviors — not vague admonitions

Warning

Don't include all of them. Selecting the relevant subset focuses attention where it matters. Including too many dilutes their effectiveness.

Quick reference by agent type

Agent type	Commonly relevant failure modes
Reviewer	Flattening nuance, Source authority, Asserting when uncertain, Padding/burying lede
Implementer	Plowing through ambiguity, Downstream effects, Instruction rigidity, Assuming intent, Silent assumption cascade
Researcher	Flattening nuance, Source authority, Confabulating, Padding/burying lede
Orchestrator	Plowing through ambiguity, Never escalating, Assuming intent, Over-indexing on recency
Advisor/Planner	Downstream effects, Confabulating, Assuming intent, Asserting when uncertain, Silent assumption cascade
User-facing agent	Assuming intent, Clarification loop paralysis, Silent assumption cascade

The catalog

1. Plowing through ambiguity

What it looks like: Makes silent assumptions, fills gaps with defaults, never surfaces "I'm unclear on X." Proceeds confidently when it should pause.

The human instinct it lacks: Recognizing ambiguity or underspecification and clarifying before proceeding.

Guard against it:

"If instructions are ambiguous or underspecified, surface what's unclear and ask — don't fill gaps silently"
"State assumptions explicitly when you make them. Prefer asking over assuming when stakes are non-trivial"

2. Flattening nuance in sources

What it looks like: Treats ambiguous or conflicting content as definitive. Picks one interpretation and runs with it without acknowledging alternatives or tensions.

The human instinct it lacks: Balanced interpretation; acknowledging that sources may be ambiguous, incomplete, or in tension with each other.

Guard against it:

"When sources conflict or are ambiguous, note the tension rather than silently picking one interpretation"
"Distinguish what a source clearly states from what you're inferring or extrapolating"

3. Treating all sources as equally authoritative

What it looks like: Fails to weigh credibility, recency, or contextual fit. Applies information out of domain. Treats a blog comment the same as official documentation.

The human instinct it lacks: Evaluating source authority, applicability to the current situation, and domain fit.

Guard against it:

"Weigh sources by authority and relevance: official docs > established codebase patterns > examples > blog posts > guesses"
"Note when you're applying information outside its original context or domain"

4. Acting without modeling downstream effects

What it looks like: Misses edge cases. Fails to anticipate how an action or recommendation could backfire, conflict with other constraints, or cause unintended consequences.

The human instinct it lacks: Thinking through "what could go wrong" and "what else does this affect" before acting.

Guard against it:

"Before making a change or recommendation, consider: what could this break? What edge cases exist? What constraints might this conflict with?"
"If an action has non-obvious downstream consequences, name them explicitly"

5. Confabulating past knowledge limits

What it looks like: Generates plausible-sounding answers when the honest response is "I don't know." Fills knowledge gaps with fabrication rather than acknowledging uncertainty.

The human instinct it lacks: Knowing when you've hit a limitation of your knowledge and being honest about it.

Guard against it:

"If you don't know or aren't confident, say so. 'I don't know' and 'I'm not certain about this' are valid responses"
"Don't invent details to fill gaps. Flag what you'd need to verify"

6. Never escalating or deferring

What it looks like: Always produces an answer or takes an action rather than flagging "this needs human judgment" or "I'm not confident enough to proceed here."

The human instinct it lacks: Knowing when to escalate, defer, or bring in someone with more context or authority.

Guard against it:

"If a decision is outside your scope, confidence level, or competence, say so and recommend escalation rather than guessing"
"It's better to flag uncertainty than to proceed and cause harm or waste effort"

7. Treating all instructions as equally rigid

What it looks like: Fails to distinguish hard requirements from soft guidance. Either over-complies (treats suggestions as mandates) or over-interprets (treats mandates as flexible).

The human instinct it lacks: Parsing directive strength — knowing what's non-negotiable vs guidance vs suggestion.

Guard against it:

"Distinguish 'must' (non-negotiable) from 'should' (strong default, exceptions possible) from 'consider' (suggestion, use judgment)"
"If something says 'consider X,' you can decide not to do X with good reason. If something says 'always X,' you cannot skip it"

8. Assuming intent instead of probing

What it looks like: Projects a goal onto the user rather than understanding their actual mental model, intent, and constraints. Fills in "what they probably want" without checking.

The human instinct it lacks: Working with others to understand their goals and the nuance of what they actually want.

Guard against it:

"Don't assume you know what they want. If intent is unclear or could be interpreted multiple ways, ask"
"When in doubt, restate your understanding of the goal and verify it matches theirs before proceeding"

9. Asserting confidently when uncertain

What it looks like: Gives a single answer or takes a single path when multiple valid options exist. Under-hedges when the situation warrants presenting alternatives.

The human instinct it lacks: Calibrating confidence to actual certainty; presenting options when genuinely uncertain.

Guard against it:

"When multiple valid approaches exist, present them with tradeoffs rather than silently picking one"
"Match your expressed confidence to your actual certainty. Don't assert what you're genuinely unsure about"

10. Padding, repeating, and burying the lede

What it looks like: Restates points in slightly different words. Adds filler phrases. Scatters key details instead of surfacing them clearly.

The human instinct it lacks: Writing with a theory of mind — progressive, non-repetitive, focused on what the reader needs.

Guard against it:

"State each point once, clearly. Don't rephrase the same idea in multiple places"
"Lead with the most important information. Structure output for the reader's needs, not for completeness"
"Before finalizing output, ask: what does the reader need to take away? Is that clear and prominent?"

11. Over-indexing on recency

What it looks like: Treats the latest input as overriding all prior context. Makes disproportionate adjustments based on recent feedback without weighing it against original intent.

The human instinct it lacks: Evaluating new inputs proportionally; maintaining awareness of full history and original intent.

Guard against it:

"New input is additional context, not a reset. Weigh it against what's already been established"
"If new feedback contradicts earlier guidance, surface the tension rather than silently overriding"

12. Clarification loop paralysis

What it looks like: Asks too many questions before acting. Seeks permission for low-stakes, reversible decisions. Creates friction by over-asking when sensible defaults exist.

The human instinct it lacks: Judging when to ask vs when to proceed with reasonable assumptions.

Guard against it:

"For low-stakes, reversible decisions, proceed with sensible defaults and label your assumptions"
"Ask only when: (1) the decision materially affects the outcome, (2) multiple valid approaches exist with different tradeoffs, or (3) the user's preference is genuinely unclear and stakes are non-trivial"

13. Silent assumption cascade

What it looks like: Makes a chain of assumptions without surfacing any of them. Each assumption builds on the previous, leading to outputs based on premises the user never agreed to.

The human instinct it lacks: Making assumptions visible so they can be challenged.

Guard against it:

"When you make assumptions, state them explicitly: 'Assuming X, I will Y'"
"If you've made multiple assumptions in a row, pause and surface them before proceeding further"

Integrating failure modes into prompts

Option A: Dedicated section

Add a "Failure modes to avoid" section in the agent prompt:

## Failure modes to avoid

- **Plowing through ambiguity:** Don't fill gaps with silent
  assumptions. If requirements or context are unclear, surface
  what's missing and ask.
- **Flattening nuance:** Don't treat conflicting or ambiguous
  sources as definitive. Acknowledge tensions and uncertainties.
- **Padding and burying the lede:** Don't rephrase the same point
  multiple ways. Lead with what matters most; cut filler.

Option B: Woven into operating principles

Integrate failure mode awareness into the operating principles:

## Operating principles

- Clarify before proceeding: if instructions or context are
  ambiguous, surface what's unclear rather than filling gaps with
  assumptions.
- Weigh sources appropriately: official docs and established
  patterns take precedence. Note when sources conflict.
- Match confidence to certainty: if multiple valid approaches
  exist, present them with tradeoffs.

Option C: Hybrid

Use operating principles for the positive framing, then add a brief callout:

## Operating principles
...

**Watch out for:** silent assumptions when context is unclear;
flattening nuance in ambiguous sources; burying key findings in
verbose output.

Why this works

Naming failure modes explicitly works because:

Self-correction targets — the agent knows what to watch for, not just what to do
Concrete behaviors — "Don't fill gaps silently" is actionable; "be careful" is not
Contextual relevance — selecting the right 3-5 focuses attention where it matters
Matches human training — skilled humans learn "here's how people mess this up" alongside "here's how to do it"

On this page