How LLMs Actually Work
The complete picture — from text to intelligence
The Core Loop: Predict, Sample, Repeat
At its heart, a Large Language Model does one thing: given a sequence of tokens, predict the probability distribution over what the next token should be. Then sample from that distribution. Then add the chosen token to the sequence. Repeat.
When you type "What is the capital of France?" into ChatGPT, the model doesn't "look up" the answer. It processes your tokens through its transformer layers and produces a probability distribution where "Paris" has a high probability as the next token. It generates "Paris", adds it to the sequence, then predicts the next token after "Paris" (perhaps a period), and so on until it generates a stop token.
This simple loop — predict, sample, append, repeat — is called autoregressive generation.