← All resources

Checklist

Agentic AI Threat Modeling Checklist

A practical checklist for threat modeling any system where an AI agent can take actions — five attack surfaces to evaluate and the containment controls that keep a manipulated agent survivable.

Use this checklist when designing or reviewing any system where a model can act — call tools, hit APIs, change state, or spend money. It assumes the model will be manipulated and focuses on making that manipulation survivable.

Attack surfaces to evaluate

Work through all five for every agentic design.

  • Prompt injection — direct and indirect. Can untrusted content the agent ingests (web pages, documents, tickets, tool outputs) steer its actions? Indirect injection is the dangerous case. You contain it; you do not prompt your way out of it.
  • Tool scope. Is any tool broader than the job requires? Flag anything general-purpose: “run any SQL,” “call any endpoint,” shell access.
  • Confused deputy. Does the agent act with its own privilege while serving a user? Can it be induced to do something the user could not authorize?
  • Memory and context poisoning. Does the agent persist memory across sessions? Treat that store as attacker-influenceable.
  • Tool and supply chain. What plugins, connectors, and external tools can the agent reach? Each is part of the attack surface.

Containment controls to verify

For each, confirm it is present and tested.

  • Tools scoped to the minimum capability (read-only where possible, parameterized, never general-purpose).
  • Human approval required on consequential actions (money, deletion, access changes, external communication), regardless of agent confidence.
  • Agent runs as its own non-human identity with short-lived, least-privilege credentials — no shared keys, no broad standing grants.
  • User identity propagated so authorization evaluates against the user, not the agent.
  • Untrusted retrieved content tagged as data, not instruction; tools refuse to act on parameters sourced from it without confirmation.
  • Full trace logged: inputs, reasoning, tool calls, tool results.
  • Blast radius bounded: rate limits, spend caps, resource scoping, a tested kill switch.

The question to design against

Not “will this agent ever be manipulated?” but “when it is, what is the worst it can do?” — and then making that worst case small. If you cannot answer the second question precisely, the design is not finished.

Want this tailored to your stack?

I adapt these for specific environments and risk profiles as part of advisory and workshop engagements.