Follow a single prompt as it travels from your message all the way to a generated answer — every stage of comprehension and generation in between. The journey splits into two connected engines: Comprehend (turning your words into meaning and grounded context) and Generate (turning that meaning into a reasoned, written answer). Same shape as search — retrieve, then produce — with one genuinely new step.
Your message is never seen alone. It is stitched together with the system instructions and the chat history into one context window — the entire, finite input the model gets. Nothing is looked up in a database of stored answers.
The standing rules: who the model is and what it may do.
set by · the appEverything said so far in this conversation, in order.
scope · the sessionThe new prompt, the thing you just typed.
scope · this turnA model never sees letters. The text is split into tokens — subword pieces from a fixed vocabulary. Common words stay whole; rarer ones split into reusable pieces, and each token becomes an integer ID.
Each token ID is looked up and becomes a vector — a list of numbers — so meaning turns into geometry: similar ideas land near each other (king near queen). The very same vector space that powers semantic search.
The prompt's vector is matched against a vector database to pull the most relevant documents — your docs, your code, your knowledge base — and inject them into the context. This is search's retrieval step, reborn inside the model's input.
The nearest passages are injected into the context window, so the answer is anchored to real sources, not just memory.
With no knowledge base attached, the model answers from its trained parameters alone — fast, but ungrounded.
A word means nothing alone. The model weighs every token against every other, all at once, to work out what refers to what and what matters. This is the 2017 transformer doing its work — relevance scoring, the search engineer's craft, turned inward on the sentence.
Now the model writes. It produces a probability over every token in its vocabulary, picks one, appends it, and runs the whole thing again — building the answer one piece at a time. A very large, very capable autocomplete. This is the stage search never had: a search engine ranks pages that already exist; this loop generates text that did not.
When the task needs more than text, the model calls tools — query a database, hit an API, edit a file. This is the line between a chatbot and an agent.
The model emits a structured tool call, runs it, and feeds the result back into the context — then keeps generating.
If the answer is just language, no tool is called — generation continues straight to the reply.
For multi-step work, a planner → worker → reviewer loop iterates, and a human approves anything consequential before it ships. Then the finished answer streams to your screen, token by token.
A model is not one-and-done either. Within a task it can loop until the work is right; across millions of tasks, your reactions quietly shape the next version. This is the engine behind plan → act → review and the preference → training ladder.
Agents break the task down, do a step, check the result, and loop until it holds.
A person approves anything consequential before it ships — the loop's safety valve.
The answer arrives live, token by token, instead of all at once.
Your thumbs-up / edit / retry feed the training data, so the next model gets better.