How LLMs Actually Work

The complete picture — from text to intelligence

1 / 6The Core Loop: Predict, Sample, Repeat

The Core Loop: Predict, Sample, Repeat

At its heart, a Large Language Model does one thing: given a sequence of tokens, predict the probability distribution over what the next token should be. Then sample from that distribution. Then add the chosen token to the sequence. Repeat.

When you type "What is the capital of France?" into ChatGPT, the model doesn't "look up" the answer. It processes your tokens through its transformer layers and produces a probability distribution where "Paris" has a high probability as the next token. It generates "Paris", adds it to the sequence, then predicts the next token after "Paris" (perhaps a period), and so on until it generates a stop token.

This simple loop — predict, sample, append, repeat — is called autoregressive generation.

Prev lesson

← → arrow keys to navigate