How to build an AI coding workflow that scales code review

AI made writing code faster. It didn't make code review faster. Here's how to build an AI coding workflow that scales both, from planning to review.

We host live workshops where we help engineers level up their AI coding workflows and show up stronger in interviews. You can find an upcoming workshop here.

AI has made writing code dramatically faster. What it hasn't done is make code review faster, and for most engineers, that gap is where the productivity gains are quietly disappearing.

The bottleneck in an AI-assisted engineering workflow isn't implementation. It's everything that comes after it. The engineers getting the most out of AI have figured this out and restructured accordingly.

Formation Studio Workshops — free, live, interactive interview practice sessions for senior software engineers, designed around how interviews actually run at top tech companies. These aren’t passive webinars. They’re mentor-led working sessions where engineers think out loud, make decisions, and debate tradeoffs in real time.

Find a Session

Step 1: Spend real time on the plan before any code is written

When AI can generate an implementation in minutes, the planning phase can feel like overhead. It isn't. The quality of the plan determines almost everything about how the implementation goes, including how much cleanup it requires, how the code review flows, and whether you end up in the kind of back-and-forth debugging that erases whatever time you saved.

Engineers running high-output AI workflows treat complex tasks very differently from simple ones. Simple tasks with a clear input and output get one-shot prompts. Complex tasks, anything that touches multiple parts of a codebase, carries risk if something goes wrong, or requires coordinating across services, get a deliberate planning phase. In practice, that means 15 to 20 minutes of focused investment before implementation begins.

That time is spent:

Reading the plan the AI generates and processing it carefully
Steering the plan if an assumption looks wrong or a gap is obvious
Deciding when the plan is actually ready

The goal is to reach a plan that is specific and detailed enough that the implementation can run with minimal intervention. Anything that isn't resolved in the plan will resurface later, at higher cost.

Get the latest in tech right in your inbox

Step 2: Use adversarial AI to pressure-test the plan

Planning in isolation, meaning one model generating a plan that you then approve, is less effective than it could be. Different models have different blind spots. A plan that looks solid to one model will sometimes have obvious weaknesses when evaluated by another.

Adversarial review is one of the most effective techniques available for catching these gaps before implementation begins. The setup is straightforward:

Model A generates the plan
Model B critiques it, looking for gaps, edge cases, and failure modes
The engineer evaluates the exchange and steers from there

This can also run as a back-and-forth loop, where each model responds to the other's critiques. The engineer's role shifts from doing the first-pass evaluation manually to judging the output of that exchange.

Running this process at the planning stage yields the highest leverage. Edge cases caught here cost nothing to address. Edge cases caught in code review cost significantly more.

Step 3: Let the implementation run

Once the plan is solid, get out of the way.

This is harder than it sounds. AI coding agents with a planning mode built in are designed to self-reflect and iterate through a full implementation. Interrupting that process at each step by checking in, adjusting output, and redirecting undermines the behavior the tool is optimized for. It also consumes the most limited resource in an AI workflow: human attention.

Engineers who frequently step in to correct mid-implementation issues are often compensating for a plan that wasn't specific enough. The intervention feels productive in the moment. What it usually signals is that more time should have been spent on step one.

The goal is to reach a place where you trust the system to run the full plan, then review the output when it's done. That's not the same as not reviewing. It's reviewing at the right moment, after the AI has done what it's built to do.

Step 4: Build the infrastructure that makes trust possible

The reason most engineers feel they have to watch every step is that they don't trust the output enough to do otherwise. That's usually a reasonable instinct, but the fix isn't more supervision. It's improving the conditions in which the AI is working.

AI agents produce more consistent output when:

The codebase architecture is clean and navigable
Documentation is current and accurate
Patterns are explicit rather than implied

When agents have to infer context from ambiguous code or outdated docs, the output reflects that ambiguity. The review burden that follows is a downstream symptom of an upstream problem.

One approach worth investing in: living documentation maintained by the agents themselves. When an agent starts a task, it reads the relevant documentation. When it finishes, it updates the documentation to reflect the changes. Context stays current without manual upkeep, and every future implementation benefits from it.

This applies to legacy codebases, too, though the investment is harder there. Engineers working in complex, undocumented repos often find they're compensating with manual oversight for problems that better documentation and architecture would partially solve.

Step 5: Know when to rebuild instead of patch

The most important signal in an AI-assisted workflow is one most engineers have experienced but don't always act on correctly.

It looks like this: a code review starts, a bug surfaces, it gets fixed, another appears, and the cycle continues. The instinct is to push through. The better move is usually to stop.

When code review drags, and issues keep surfacing, it almost always means the plan had gaps, not that the implementation tool failed. The AI executed what it was given. The spec wasn't complete enough. Those gaps are now being discovered one at a time, in the most expensive way possible.

The right response is to stop patching and rebuild the plan. Continuing to fix individual issues in a fundamentally underspecified implementation produces more problems, not fewer. Engineers who've learned to treat this cycle as a signal and cut losses early consistently come out ahead.

There's also a compounding benefit to rebuilding. Improving the spec after a bad review prevents the same category of issue from reappearing in future work. The investment isn't wasted; it feeds forward.

What this looks like end to end

Putting it together, the workflow looks like this:

Before implementation: Spend real time on the plan. Use adversarial AI to pressure-test it. Approve when it's genuinely ready, not when it's good enough.
During implementation: Let the AI run. Stay available to course-correct if something goes clearly off track, but default to trusting the plan you built.
After implementation: Review the output. If the plan was solid, this moves quickly. If it's dragging, that's the signal to go back to step one.

The throughline is where human attention goes. Concentrated at the planning stage, it prevents problems. Distributed across every implementation step, it slows everything down without proportionate benefit.

What's coming

Code review is where the most active AI development is happening right now. Automated agents that run reviews in parallel with implementation are already emerging. The tools for adversarial model review are improving. Infrastructure for AI-maintained documentation is becoming more practical to implement at scale.

The engineers who benefit most from these developments won't be the ones who adopt the newest tools fastest. They'll be the ones who have already developed the underlying instincts: when a plan is ready, when the output can be trusted, when to rebuild rather than patch. Those judgments don't come from the tooling. They come from building and running the workflow enough times to know what good looks like.

Engineers who develop those instincts now will be better positioned to use these tools well as they mature.

Dig deeper with Formation

If you're exploring how to use AI more effectively in your work, you're not alone. Join us for a live workshop where we dig into how to make the most out of your workflow.

How to build an AI coding workflow that scales code review

Step 1: Spend real time on the plan before any code is written

Step 2: Use adversarial AI to pressure-test the plan

Step 3: Let the implementation run

Step 4: Build the infrastructure that makes trust possible

Step 5: Know when to rebuild instead of patch

What this looks like end to end

What's coming

Dig deeper with Formation

Read next

Claude Code vs. Codex: what's the best AI coding agent for software engineers?

The best AI coding tools in 2026 and how to know which one to use

How to use AI for coding: The simple vs. complex framework that unlocks real productivity gains

Step 1: Spend real time on the plan before any code is written

Sign up for our newsletter

Step 2: Use adversarial AI to pressure-test the plan

Step 3: Let the implementation run

Step 4: Build the infrastructure that makes trust possible

Step 5: Know when to rebuild instead of patch

What this looks like end to end

What's coming

Dig deeper with Formation

Read next