Token

Why language models count and process text in tokens rather than in whole words or pages.

A token is a unit of text a model processes as one step in its internal sequence. Tokens are often smaller than full words. A short common word may be one token, while a long word, number, or code string may become several. This is why model limits and pricing are usually measured in tokens instead of words.

Why Tokens Matter

Every message, instruction, retrieved document, and generated answer consumes tokens. That means tokens directly affect cost, latency, and how much can fit into the context window. A prompt that looks modest to a person may still be token-heavy if it includes dense formatting, tables, or code.

Tokens also matter because generation happens token by token. A model does not create an entire paragraph all at once. It repeatedly predicts the next token, which is one reason output can sound fluid while still drifting into mistakes or unexpected continuations.

Why Readers Should Care

Understanding tokens helps explain why some prompts are expensive, why long documents are hard to manage, and why different phrasings can behave differently even when they mean roughly the same thing. It is a small concept with large practical consequences.

For anyone using or building language systems, tokens are part of the system's basic operating budget. Once you understand that, many model behaviors become easier to interpret.

Related concepts: Tokenization, Context Window, Large Language Model (LLM), and Prompt.