IntermediateClaude API

Tool use patterns that don't burn tokens

Tool use is powerful and expensive. Here's how to keep both true.

Tool use lets Claude call your functions. The default patterns work — and burn tokens unnecessarily. The four optimizations that cut spend without losing capability.

11 min read
apitool-useagentsoptimization

Tool use turns Claude from a chat completion into a controllable agent. Define functions in your prompt, Claude calls them, you return the results, Claude continues. It's the foundation of every meaningful agent built on Claude.

It's also a token-cost trap if you let it be. Four patterns make the difference between "Claude with tools is amazing" and "our API bill tripled."

Pattern 1: Cache the tool definitions

Every turn of an agent loop sends the full tool list. If you have 12 tools with detailed descriptions and parameter schemas, that's 5–10K tokens per turn. A 20-step loop sends them 20 times.

Mark the tool definitions block as cacheable:

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    tools=tools,                       # 12 tools, ~8K tokens
    system=[
        {
            "type": "text",
            "text": SYSTEM_PROMPT,
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=conversation_history,
)

The system prompt + tool definitions get cached. Subsequent turns within the 5-minute TTL pay 10% of input cost for that prefix. On a 20-step loop, that's a ~5x cost reduction on tool overhead alone.

Read the prompt caching deep-dive for the full picture.

Pattern 2: Keep tool descriptions tight

Tool descriptions are tokens that ship every turn (cached or not). Most teams write them like API docs — three paragraphs explaining edge cases.

Cut them down. A good tool description is two sentences: what it does, when to use it. Edge cases live in the parameter schema's description fields, not the top-level description.

Bad:

"description": "This function searches the company's
internal knowledge base, which contains documents from
the engineering wiki, the product spec library, and the
sales playbook. Use it for any question that might be
answered by internal documentation, especially when the
user mentions specific products, features, or processes.
Note: it does not include external customer-facing docs."

Good:

"description": "Search internal docs (wiki, specs, sales
playbook). Use for product/feature/process questions."

Same information, 70% fewer tokens, ships every turn.

Pattern 3: Return small results, not whole payloads

When a tool returns 50KB of JSON, Claude has to read all 50KB to find the one field that mattered. Then it ships that 50KB in the next turn's context. Then the next. Token cost compounds.

Trim aggressively in your tool implementation. If Claude asks for "the user's last 10 orders," don't return every column of the orders table — return order_id, date, total, status. If it needs more, it can call a follow-up tool.

A useful pattern: return a summary object with a details_url Claude can fetch on demand. The default is small; the deep dive is opt-in.

Pattern 4: Use parallel tool calls when independent

Claude can issue multiple tool calls in a single turn. If three lookups are independent (no result feeds the next), do them in parallel:

# Claude returns three tool_use blocks in one response
[
    {"type": "tool_use", "id": "1", "name": "get_user", ...},
    {"type": "tool_use", "id": "2", "name": "get_billing", ...},
    {"type": "tool_use", "id": "3", "name": "get_subscription", ...}
]

You execute all three concurrently in your code, return all three results in the next user message. One round-trip instead of three. Cuts both latency and the cost of the intermediate "still thinking..." tokens.

Encourage this in the system prompt: "When multiple tool calls are independent, issue them in parallel. Only chain when one result is needed for the next."

What not to do

  • Don't define every conceivable tool just in case. Each unused tool is dead weight. Start with 3–5; add more when you see a clear gap.
  • Don't pass auth in the prompt. Tool definitions are visible to Claude, and Claude is not a secrets vault. Inject auth in your tool executor, not in the description.
  • Don't make every step a tool call. If the answer is "compute X from these inputs," let Claude reason directly instead of writing a compute_x tool. Tools are for things Claude can't do without external state.

Where to go next

Keep learning

Apply this with Waymaker

Get this article surfaced where you work

Inside Waymaker, this article shows up next to the right Signal page — so the lesson lands when you need it, not before.

No credit card required.