# When to build an AI agent: a 4-point production checklist

> Three rules for shipping production agents: build them only for ambiguous high-value tasks, keep the architecture minimal, and inspect the context window.

Published: 2026-05-22
URL: https://daniliants.com/insights/ambiguity-value-bottlenecks-cost-of-error-all-4-kill-pr/
Tags: ai-agents, anthropic, agentic-workflows, llm-orchestration, multi-agent, prompt-engineering

---

## Summary

Anthropic's Barry (co-author of "Building Effective Agents") distills three rules for shipping agents in production: don't build agents for everything, keep the architecture as simple as possible, and "think like your agent" by inspecting its context window. The core message: most use cases are better served by predictable workflows, and agents are only worth the cost/latency/error risk for genuinely ambiguous, high-value tasks.

## Key Insight

**When to build an agent (4-point checklist):**

- **Complexity** - agents only earn their keep in ambiguous problem spaces. If you can map the full decision tree, build it explicitly and optimize each node. Cheaper, more controllable.
- **Value** - exploration burns tokens. At ~10 cents/task budget (high-volume support) you only get 30-50k tokens, so use a workflow for common scenarios. Agents fit when you genuinely don't care about token spend.
- **Bottlenecks (cost of error)** - de-risk critical capabilities first (e.g. a coding agent must write, debug, and recover from errors). Bottlenecks aren't fatal but multiply cost/latency. Fix by reducing scope.
- **Cost of error + error discovery** - high-stakes, hard-to-detect errors mean you can't trust autonomy. Mitigate with read-only access or human-in-the-loop, but that caps how far the agent scales.

**Agents are just "models using tools in a loop."** Three components define an agent:

- Environment (the system it operates in)
- Tools (interface to act + get feedback)
- System prompt (goals, constraints, ideal behavior)

The model then runs in a loop. Anthropic's three internal agent products share **almost the exact same backbone/code** - only the tools and prompt differ. Any upfront complexity kills iteration speed; optimizations (trajectory caching for coding, parallel tool calls for search) come later.

**Why coding is the canonical agent use case:** ambiguous (design doc to PR), high-value, models are already good at it, and output is verifiable via unit tests + CI.

**"Think like your agent":** at each step the model only knows what's in its ~10-20k token context. Do the exercise of acting as a computer-use agent - you get a static screenshot, take an action blind, then "close your eyes for 3-5 seconds" before seeing the result. This reveals what context the agent actually needs (screen resolution, recommended actions, guardrails). You can also feed the system prompt / tool descriptions / full trajectory back into Claude and ask whether instructions are ambiguous or what would help it decide better.

**Forward-looking open problems:** budget-aware agents (enforce token/time/money limits), self-evolving tools (a meta-tool where agents improve their own tool ergonomics), and multi-agent collaboration with async (non synchronous-turn) communication.