From Prompt Engineering to Loop Engineering

The way people work with AI agents has changed fast, but the change is not random. It follows a simple pattern: each new generation moves one layer deeper into the system.

First we optimized the prompt. Then we optimized the context. Then we optimized the harness around the model. Now the real frontier is the loop itself.

That shift matters because it changes what counts as skill. The best operator is no longer just the person who writes clever instructions. It is the person who can design a reliable run loop, keep it on task, and decide when to stop it.

Stage 1: Prompt engineering

Prompt engineering was the first wave because it was the only control point available.

When people first started using large language models seriously, the core question was: how do I phrase this so the model gives me what I want?

That made sense. The model was the whole product. There was no tool use, no long-running state, and usually no external memory. So the human had to do all the steering in the prompt itself.

The emphasis was on:

wording
examples
role setting
output format
constraint phrasing

That still matters, but it is no longer the main bottleneck. Good prompts improve behavior, but they do not solve weak context selection, bad tool design, or poor iteration discipline.

Prompt engineering is the entry level of agent work. It teaches you to be precise. It does not yet teach you to build a system.

Stage 2: Context engineering

Once models got better at following instructions, people discovered that the bigger failure was often not the wording of the prompt. It was the quality of the context around it.

This is where context engineering became the more useful lens.

Anthropic has described context engineering as the discipline of curating what the model sees so it can stay grounded across a task. That framing matches the reality of modern agent work: the model is only as good as the information, constraints, and history you feed into the loop. Context engineering essay

Context engineering changed the operator’s job in three ways:

You stopped asking, “What should I say?” and started asking, “What should the model know?”
You stopped stuffing everything into one prompt and started selecting only the relevant evidence.
You treated context as a scarce budget, not a dumping ground.

That is a big conceptual upgrade. It pushes the human from copywriter to editor.

The practical skills now look like:

choosing the right source files, docs, and examples
trimming noise before it reaches the model
preserving decision history
sequencing context across turns
keeping a running task state without overload

This is also why long-context models matter. A larger window is not just a benchmark feature. It changes the economics of context design. But even with a huge context window, the main job is still selection, not accumulation.

Stage 3: Harness engineering

As soon as models started using tools, the next bottleneck became the wrapper around the model.

A model does not magically become an agent when you add tool calls. It becomes an agent when you build a harness that can run the model repeatedly, inspect outputs, feed back results, and keep the task moving.

That harness typically handles things like:

tool invocation
message ordering
retries and fallbacks
state persistence
file and repository access
test execution
safety checks
logging and traces

This stage matters because a lot of agent performance is actually system performance.

A weak harness can make a strong model look bad. A good harness can make an average model look much better than expected.

This is the point where the operator starts behaving more like a systems engineer. You are no longer only writing prompts or choosing context. You are deciding how the model is allowed to act, what tools it can touch, and what evidence it must see before it continues.

Anthropic’s Claude Code best practices and OpenAI’s Codex workflows both lean heavily in this direction: the value is not just in one model response, but in the repeatable loop of inspect, act, verify, and continue. Claude Code best practices | OpenAI Codex long-running tasks

Stage 4: Loop engineering

Loop engineering is the natural next step because most real agent work is iterative.

The model does not solve the task once. It walks the task.

That means the important skill is not only how to start the run, but how to keep the run healthy over multiple turns.

A good loop usually has four parts:

Observe: gather the current state, errors, and outputs.
Decide: choose the next action with the smallest useful step.
Act: run the tool, edit the file, or make the change.
Verify: check whether the result actually moved the task forward.

That sounds obvious, but it is where most agent use breaks down. People either let the model drift, or they interrupt it too early, or they keep it running without a stopping rule.

Loop engineering is about controlling that cycle.

The real questions become:

What evidence should the model inspect before acting again?
When should it re-read the task instead of assuming it remembers?
What is the smallest safe next action?
What test or check proves the last step worked?
When is the loop done?

That final question is underrated. Many agent failures are not hallucination failures. They are termination failures.

Why loop engineering makes sense

Loop engineering is rational because software work is already iterative.

Even in a normal engineering workflow, you do not write the final answer in one shot. You draft, inspect, correct, test, and repeat. Agents just make that cycle explicit and machine-assisted.

Once that becomes obvious, the skill frontier shifts again:

from prompt text to task design
from task design to context curation
from context curation to harness reliability
from harness reliability to loop control

That progression is not a fad. It is a sign that the center of gravity is moving from language to operations.

My own view is that “loop engineering” is the first term that really describes what strong agent users do in practice. They do not just ask better questions. They build better operating conditions.

What good loop engineering looks like

If you want to use Claude Code, Codex, or any similar agent product well, think in terms of constraints and checkpoints, not vibes.

A strong loop usually has these properties:

a clear goal with a narrow scope
a compact but relevant context set
explicit stopping criteria
a cheap verification step
a bias toward small actions over big guesses
a habit of re-reading the task after meaningful changes
logs or traces you can inspect when something goes wrong

This is also why the best agents feel boring when they work well. They are not improvising constantly. They are executing a controlled cycle with enough autonomy to be useful and enough friction to stay honest.

How to use Claude Code and Codex correctly

The most common mistake is treating agent tools like faster chatbots.

That is the wrong mental model.

A better model is: you are hiring a junior operator and giving it a limited work envelope.

That means you should:

define the objective in one sentence
provide only the files, docs, or examples that matter
ask for one concrete outcome at a time
require validation before claiming success
let the agent iterate inside the loop instead of manually micromanaging every step

For Claude Code-style workflows, this usually means leaning into repo-local context, edits, and verification, instead of pasting huge prompts.

For Codex-style workflows, the same principle applies even more strongly: use the agent to inspect the repository, propose a plan, make the change, and verify it. The value is in the closed loop, not in a single polished response.

If you want the short version, use this rule:

Give the model enough context to act, enough tools to verify, and enough boundaries to stay honest.

My speculation: the next stage is loop design as a product skill

Here is the part I think will matter next.

In the near future, the best AI users will not be the people with the fanciest prompts. They will be the people who can design loops that are cheap, observable, and hard to derail.

That will include:

deciding which steps should be automated
deciding which steps need human approval
defining retry behavior
deciding when to branch versus continue
deciding how much state to carry forward
deciding what evidence is sufficient for a stop

In other words, the skill will look less like writing and more like building a control system.

That does not make prompts irrelevant. It just puts them back in their proper place.

Prompts are the starting interface. Context is the working set. The harness is the machine around the model. The loop is the real product.

And once you see it that way, the evolution from prompt engineering to loop engineering looks less like a buzzword ladder and more like a genuine maturation of the field.

A short reading list

If you want the source material behind this essay, start here: