The context window is the maximum number of tokens a large language model can process in a single input (prompt plus output). It is determined by the model's architecture and training. Everything the model can "see" for a query (instructions, examples, context, conversation history, reference documents) must fit within this token budget, making the limit one of the most consequential constraints in designing LLM applications. It's the size of the model's working memory for any given request.
The token math:
1 token ≈ 0.75 English words (rough approximation). 1 token ≈ 4 characters of English text.
So a 100,000-token context window holds roughly 75,000 words, or about 300 pages of a typical book.
How context windows have grown...