Why Every Token Costs More Than You Think
Why Every Token Costs More Than You Think The Quadratic Price of Attention: How Context Length Is Killing Your AI Budget Who this is for. If you use ChatGPT, Claude, Copilot, or Cursor to write cod...

Source: DEV Community
Why Every Token Costs More Than You Think The Quadratic Price of Attention: How Context Length Is Killing Your AI Budget Who this is for. If you use ChatGPT, Claude, Copilot, or Cursor to write code, this article explains why the same tasks can cost 2–4× less. No technical background required — all terms are explained inline and in the glossary at the end. When you ask Claude or GPT to write a sorting function, the model generates ~50 tokens1 per second. Each token costs fractions of a cent. Seems cheap. But behind that simplicity lies an engineering reality most people overlook: the cost of each token grows quadratically with context length2. If you're working with codebases spanning thousands of lines, this quadratic relationship transforms from a theoretical abstraction into a line item that can double your AI budget. In this article, I'll show where this cost comes from, why inference — not training — is the dominant consumer of resources, and what can be done about it. Inference C