Inference
Also known as: model inference, LLM inference, inference call
The runtime process of running an input through a trained large language model to produce an output. Inference is the operating phase (as opposed to training); each enterprise agent invocation is one or more inference calls. Inference cost — measured per token of input and output — is the foundation cost line in any agent TCO model.
Inference cost has dropped roughly an order of magnitude per year since 2022 (the LLMflation curve), and is no longer the dominant cost line in production enterprise agent stacks. In 2026 the total operational cost is typically: model inference (smallest line), orchestration runtime, integration layer, observability + evaluation, human oversight (largest line during the first 6-12 months of production). Procurement spreadsheets that price inference and call it the cost of the agent are modelling the demo, not the production deployment.
Articles that analyse this term
Primary sources
- Andreessen Horowitz. LLMflation — the falling cost of LLM inference
- Anthropic. Pricing — per-token inference cost