The Cost Curve of Unchecked LLM Context

LLM use can get expensive if the number of per‑call tokens is unchecked.

There is a contradiction in LLM‑assisted engineering: you must provide enough context for quality, yet the workflows that supply this context tend to inflate token usage beyond what the model can meaningfully use.

Below is a concrete, realistic example for one developer, showing exactly how costs can balloon when normal workflow patterns accumulate more context than the model can effectively handle.

This article is about unchecked token growth in engineering workflows and the cost multipliers that follow.

Unchecked growth could cost you thousands per engineer a month

Imagine an engineer building an agent to analyse logs. The system has two major subsystems: log processor with a UI, and access to a remote runtime LLM system that works on the log contents surfaced by the log processor, providing insight back to the UI about what is in the log and what it means.

During development, the engineer is iterating, debugging, and refining the solution. To do this, the engineer is using AI support within their engineering workflow.

To support the engineer's development of the solution, the LLM being used in the engineering workflow is:

  • analysing log fragments
  • suggesting new extraction queries
  • reflecting on errors
  • retrying after failures
  • running multi-step loops

Initial token cost

Initially, the token cost for the above might be:

  • system prompt: 300 tokens
  • user input: 200 tokens
  • expected output: 300 tokens

This is a total of about 800 tokens.

Model tier Tokens used Price per million tokens Total cost per call
Cheap 800 $0.20 $0.00016
Medium 800 $3.5 $0.0028
High‑end 800 $10 $0.008

All of this is cheap, predictable, and stable.

If this resonates, the newsletter continues the work. Subscribe

Unchecked token growth during engineering

Unchecked token usage can lead to:

  • system prompt growth
  • chat history accumulation
  • a RAG component generating too much context
  • retries repeating the entire prompt
  • additional tools added to the engineering environment, generating more tokens
  • output schemas get verbose
  • debugging prints get included in context
  • logs get pasted directly into prompts

Costs balloon

Assume that an engineer makes 300 LLM calls per 8-hour day to develop the above log agent.

Over time, unchecked token usage can increase:

  • system prompts to 2,500 tokens
  • full chat history kept for LLM context and continuity at 4,000 tokens
  • rAG over‑retrieval produces six log files totalling 8,000 tokens
  • output schema at 1,000 tokens
  • safety boilerplate in the prompt at 500 tokens
  • engineering LLM output as before at 300 tokens

This is a total of 16,300 tokens per LLM call.

For the cheap, medium, and high-end LLM engineering models, this is the cost:

Tier Cost per 16,300‑token call Daily cost (300 calls) Monthly cost (20 days)
Cheap $0.00326 $0.978 $19.56
Medium $0.05705 $17.115 $342.3
High‑end $0.163 $48.9 $978.0

The cost per call is low. The monthly cost soon adds up.

The cost when token usage is checked:

Tier Cost per 800‑token call Daily cost (300 calls) Monthly cost (20 days)
Cheap $0.00016 $0.048 $0.96
Medium $0.0028 $0.84 $16.8
High‑end $0.008 $2.40 $48.0

When using more than twenty times the tokens per call, the high-end model cost balloons from $48 per month to $978.

As a chart, the trend becomes quite clear.

Descriptive alt text

Are 300 LLM calls per day realistic?

Yes. It depends on what needs to be done.

Code generation

When using an interactive coding assistant, every on-screen autocompletion of code or refactor of the implementation to improve the approach, or explanation of what a part of the solution does, is a call to the LLM. An hour of active coding could generate 40 to 80 LLM calls.

Agentic assistance

Agentic workflows typically operate with a "plan, act, observe, revise" pipeline. If the engineer instructed an agent to "add logging so that a division by zero error is logged and all tests pass", the LLM would generate a plan to read the appropriate file with the division code in it, inspect the tests, modify the function to add logging in the right place, run the tests and if the tests fail, revise the change to the code. This is the plan. That plan is then put into place, observations are made of the outcome and any revisions are identified. This pipeline may involve 3 to 10 calls.

Testing

When running a test suite, if the tests use an LLM, it is easy to perform 100 calls per test suite run.

Teams use LLMs within test suites to:

  • evaluate natural‑language behaviour
  • judge style, tone, or reasoning
  • compare outputs that are not byte‑identical
  • classify correctness when rules are fuzzy
  • score answers in educational or assessment systems
  • validate agent behaviour
  • check explanations or rationales

Documentation

When the engineer needs to document or review code, each comment, rewrite or suggestion is an LLM call.

Overall, in these environments, 300 calls per day is realistic. Depending on your organisation, it may be on the low side.

Why so many calls?

LLMs are statelessness, probabilistic, have limited context, no persistent memory, and no internal world‑model so they do not understand your code nor the system that will execute it.

This design means that complex tasks — such as adding logging to the right place in code — must be decomposed into many short, repeated calls.

Multiple calls are a necessity to achieve quality collaboration from an LLM.

Not only must there be multiple calls but a minimum amount of information is required per call. Without this contextual information due to a lack of a world-model, the quality of results would be significantly lower: no one wants an LLM hallucinating code due to its probabilistic implementation.

However, passing too much information decreases the quality of LLM output.

This is because of how LLMs are built. An LLM will compress, blur, and mis‑prioritise information when the prompt becomes too large or too dense.

A prompt is too dense when it contains more information than the model can meaningfully separate, prioritise, or reason over. Prompt quality is not only about prompt length.

If the prompt contains too many facts, instructions, examples, or files, the LLM cannot assign stable importance weights, so it treats everything as equally relevant. The result will be vague, averaged, or generic output.

If an LLM is presented with multiple coding styles, conventions, or code patterns and they appear close together, the model cannot decide which pattern to follow, so it blends them. The result is inconsistent naming, mixed styles, or contradictory behaviour.

For prompts, less is more.

With an LLM, too little context leads to poor results. With enough context, results will be good. With too much context, you will have poor results and high cost.

Conclusion

Even though $10 per million tokens does not sound like much the rate of usage is key. And a high rate of usage may come from the type of work your engineers are performing.

A minimised number of tokens is required to get good results from an LLM. However, engineering tooling and workflow pressures can lead to an increase in the number of tokens used per call.

Unchecked token growth is both expensive and counterproductive.

Organizations must size token usage appropriately, balancing cost and quality.

A system without a world‑model forces engineers to restate context, but restating too much context degrades quality and inflates cost. The job of engineering leadership is to enforce context discipline and workflow design to keep teams on the efficient part of the curve.

The goal is not to minimise tokens but to control context so that the model receives only what it can meaningfully use.

Read next: Hiring in an AI World
Code generation is now automated. We need to evaluate engineering judgement.

If this was useful, you can get more pieces like it in the Phroneses newsletter.

Subscribe →

I work with leaders and teams on clarity, capability, and momentum. Work with me →

Table of Contents

\