Context window
Also known as: context length, context size, input window
The maximum amount of input tokens (system prompt, user prompt, retrieved context, conversation history, tool-call results) a large language model can process in a single inference call. Frontier models in 2026 ship context windows ranging from 128K tokens (GPT-4o) to 2M tokens (Gemini 2.5 Pro for select customers); 200K tokens is the typical Claude Sonnet/Opus default.
Context window is the most over-specified procurement criterion in 2026 enterprise agent procurement. Most production agents use less than 50K tokens of context per call; the marginal value of 1M+ context drops sharply once retrieval and compression are properly engineered. The exception is long-document workflows (legal, scientific, audit) where the full document needs to be in the model's view at once. Enterprises that buy on context-window headline numbers without modelling actual workload tokens routinely overspend.