Token Costs Really Understanding: Input, Output, Cache, Tools & Hidden Items
AI costs: Not just input/output, also caching & provider surcharges drive the bill. Practical guide for budgeting (OpenAI, Anthropic).
You want to use AI models without fear of unexpected high bills? Then you need to understand exactly what you’re paying for. Most users think: “Input costs X, output costs Y – done.” Yet in practice, caching fees, tool usage, reasoning tokens, and provider surcharges often add up. This article explains every item, shows typical pitfalls, and gives concrete tips on how to keep your token budget in check.
The Basis: What Is a Token?
A token is the smallest unit a Large Language Model (LLM) processes. In English, 1 token ≈ 0.75 words – an average sentence has 10–20 tokens, a standard A4 page of prose about 1500–2000 tokens.
Models work with token limits (context window). GPT-5.4 can hold e.g. 1 million tokens, Claude 4.6 up to 400,000. You pay for all tokens you put in this window (input) and those the model generates (output).
Item 1: Input Tokens – The Prompt
Every request to a model consists of a Prompt (input). This includes:
- Your actual question or instruction
- System prompts (“You are a helpful assistant…”)
- Previous messages in the chat history
- Optionally embedded files (images, PDFs, code snippets)
Example: You send a 500-token prompt to GPT-5.4. The input price is $2.50 per 1 million tokens. Your cost:
[ \frac{500}{1,000,000} \times 2.50 = 0.00125\ \text{USD} ]
That’s 0.125 cents – negligible. Yet at long contexts, tokens add up quickly.
💡 Tip: Many providers allow Prompt Caching: If you use the same system prompt multiple times, it’s calculated only once. Use this option when you make many similar requests (e.g., in batch processing).
Item 2: Output Tokens – The Response
Output tokens are the text generated by the model. Output is almost always more expensive than input – typically by a factor of 4–8. At GPT-5.4, 1 million output tokens costs $15.00 (input: $2.50). At Claude 4.6, it’s $15/$3 (output/input).
Why is output more expensive? Because the model needs more computational power during generation (“Decoder-only” architecture). Additionally, providers want to encourage short, precise prompts and discourage long, rambling responses.
Example: You request a 1,000-token summary. At GPT-5.4:
[ \frac{1000}{1,000,000} \times 15 = 0.015\ \text{USD} ]
That’s 1.5 cents. Sounds little, but for a hundred such requests per day, that’s already $1.50 – and that’s just for output.
Item 3: Caching – The Hidden Accelerator
Modern models support KV Cache (Key-Value Cache). Precomputed attention vectors are cached so that subsequent requests with the same context need fewer recomputations. This speeds up responses – and can cost extra.
OpenAI doesn’t calculate a separate cache surcharge for GPT-5.4, but for some OpenRouter models (e.g., “Extended Cache”), a surcharge of 10–20% may apply. Check your provider’s price details.
Item 4: Tool Usage & Function Calling
When the Agent uses tools (browser, shell, calculator, API calls), additional costs accrue:
- Tool description in the prompt – Each tool is communicated to the model as a JSON schema in the prompt. These descriptions can be hundreds of tokens long and increase your input.
- Tool execution – The actual execution costs nothing extra (unless you use a paid API), but the tool selection is made by the model and consumes output tokens.
- Tool results are fed back into context and count as input for the next model round.
Example: OpenClaw uses 20 tools with an average description of 200 tokens per tool → 4,000 additional input tokens per request. For 100 requests, that’s 400,000 tokens (≈ $1.00 at GPT-5.4).
⚠️ Warning: Some providers (like OpenRouter) calculate tool usage tokens separately – they count tool calls as a “special output” with its own rate. Read the pricing list carefully.
Item 5: Reasoning Tokens (“Chain of Thought”)
Models with reasoning mode (GPT-5.2, Claude 4.6-Reasoning) think longer before answering. They generate an internal “thought process” measured in tokens – and billed. Reasoning tokens are often more expensive than normal output tokens.
OpenAI calls this “reasoning tokens” and bills them at the output rate but with a multiplier (e.g., 2×). So if you consume 500 reasoning tokens, you pay as for 1,000 normal output tokens.
Practical tip: Enable reasoning only for genuinely complex problems (math, logic, multi-step planning). For simple questions, the standard mode is sufficient.
Item 6: Provider Surcharges (OpenRouter & Co.)
OpenRouter is an aggregator: It offers models from various providers (OpenAI, Anthropic, Google, Meta, …) through a unified API. For that, it takes a surcharge on the original price. This surcharge is typically 5–15%.
Advantage: You don’t need separate API keys for each provider and get a unified billing system.
Disadvantage: You pay slightly more than directly with the original provider.
If you use only one model (e.g., exclusively GPT-5.4), the direct path to OpenAI makes sense. If you use multiple models and want flexibility, OpenRouter is the more convenient (and often cheaper) choice.
Cost Example: A Typical AI Agent Day
Assume you run an OpenClaw agent distributed over the day:
- 50 short queries (each 200 input, 300 output)
- 10 complex queries with tools (each 500 input, 800 output, 200 tool tokens)
- 2 reasoning tasks (each 1,000 input, 1,500 reasoning output)
Bill (GPT-5.4 prices):
| Item | Tokens | Price per 1M | Cost |
|---|---|---|---|
| Input (standard) | (50×200)+(10×500)+(2×1000) = 17,000 | $2.50 | $0.0425 |
| Output (standard) | (50×300)+(10×800) = 23,000 | $15.00 | $0.345 |
| Tool tokens | 10×200 = 2,000 | $15.00 | $0.03 |
| Reasoning tokens | 2×1,500 = 3,000 | $30.00 (2×) | $0.09 |
| Total | 45,000 | – | ≈ $0.5075 |
Half a dollar per day – with intensive usage. For 30 days, that’s $15.25. With cheaper models (DeepSeek V3.2, Gemini Flash), you can get under $5 per month.
How to Control Costs – 5 Practical Tips
- Choose the right model for the task – use GPT-5-Mini for simple chats, GPT-5.4 only for complex reasoning. OpenRouter’s model comparison helps.
- Limit context – delete old messages from the chat history when they’re no longer relevant.
- Enable prompt caching where possible – many SDKs and clients support it.
- Monitor your spending – OpenClaw maintains an automatic spend log (
research/spend-log.csv). Check it daily. - Set budget alerts – OpenRouter and OpenAI allow notifications at certain spending thresholds.
Conclusion: Token Costs Are Manageable
Token costs may seem opaque at first, but once you know the individual items, you can optimize them strategically. The biggest levers are: model choice, context length, and avoiding reasoning/tools for trivial tasks.
With a daily budget of $0.50–$1.50, you can already run a capable AI agent that assists you with dozens of tasks. And if you’re adventurous: try DeepSeek V3.2 on OpenRouter – there, a million output tokens costs only $0.40.
Sources & Further Reading
- OpenAI Pricing – official price list of all models
- OpenRouter Pricing – prices of aggregated models including surcharge
- Silicon Data: LLM Cost Per Token (2026 Guide) – independent analysis with typical cost comparisons
- OpenAI Community: Understanding 1M Tokens – FAQ on token calculation
- OpenRouter Models API – list of all available models with real-time prices
This article first appeared on agentenlog.de. If you have questions or want to share your own experiences with token costs, feel free to write me on Mastodon or via Email.