The Paradox of the Fast Engineer

Mon, 18 May 2026 00:00:00 +0000

TL;DR

The judgment that lets an engineer override a model is built in the slow work the model now offers to do for them. Accept enough of that help on the work that would have built the judgment, and the agent’s speed arrives without the quality, security, scalability, maintainability, or operational sense that the slow work used to deposit alongside the code.

Three months after shipping, customers start complaining that menu items are missing from the navigation. The query that builds the menu does three LEFT JOINs against a self-referencing categories table. The agent produced that shape when the engineer described the requirement; review passed because the test fixtures were three levels deep. Production grew to seven. The query was silently truncating subcategories the day it shipped, and the engineer who accepted the output had never reached for a recursive CTE, because nobody on the team had ever shown them one.

The fix is a recursive CTE with UNION ALL, anchored on the root row and joining the source table back to itself until no more rows come out. Five lines. Both shapes are valid SQL; the one that holds up against arbitrary depth is the one the engineer reaches for only after seeing it before. Without that prior, the idiom isn’t in their decision space. They can’t ask the agent for it, and they wouldn’t recognize it as the right answer if the agent offered it. No memory of a broken version that lacked it, no internal alarm that “three LEFT JOINs against a tree” is the shape of a future incident.

The obvious fix isn’t the fix

Review the agent’s code before approving it. True, and insufficient. The reviewer who has never written a tree walk over a self-referencing table doesn’t know what they should be looking for. They see SQL that compiles, returns rows on the test data, and matches the shape of the request. The internal alarm that says “this assumes a fixed depth, what happens when the tree is deeper than the joins” doesn’t come from reading SQL. It comes from writing the broken version yourself, watching it fail in production, and tracing the failure back through your own assumption.

Code review without that prior pain is pattern matching against the surface of the query. The bugs that ship through review are the ones where the surface looks right.

The paradox

Here is the paradox. The judgment that lets an engineer override the model is built in the slow work the agent now offers to do for them. The engineer who accepts the output, reviews it briefly, and ships it has gotten the speed. They have not gotten the read on whether the query holds under the production tree shape, the security sense for whether the patch closed the CVE without invalidating something downstream, the scalability instinct for whether the join multiplies under real data, the maintainer’s eye for whether this diff just doubled the toil bill six months from now, or the operational feel for which parts of the system are load-bearing and which are decoration. None of those come bundled with the agent’s output. The five-minute version of the recursive CTE problem passes through them without depositing anything, the way watching someone debone a chicken on YouTube does not teach you when the knife is sharp enough.

The pattern shows up in the public data. METR ran a controlled study in July 2025 on sixteen experienced open-source developers working in repositories averaging more than a million lines and a decade old. The developers self-reported a 20% speedup from AI assistance. Measured against the control, they were 19% slower. A forty-point gap between what the engineer feels and what the stopwatch records, on a population that does this work for a living.

Google’s 2025 DORA report found 90% of developers using AI and over 80% reporting it made them more productive, while organizational delivery metrics stayed roughly flat for teams without strong measurement practices. The same report measured bugs per developer up 54% and the median time a pull request spends in review up 441%. The verification work the agent created moved to the reviewer. The reviewer is now the bottleneck the agent isn’t helping, and the skill that makes a reviewer fast (the recognition of which agent-generated PR is hiding a fixed-depth assumption, or a missing index, or a quietly invalidated invariant) is the skill the same reviewer is no longer building by writing the slow version themselves.

Cloudflare’s Project Glasswing write-up lands on the same shape from the security side. When they let a security-focused model write its own patches against live infrastructure code, the fixes “fixed the original bug while quietly breaking something else the code depended on.” The thing standing between those patches and production was a senior engineer who could read a regression suite and notice when a patch had quietly invalidated a load-bearing assumption. That recognition was built over years of debugging exactly that class of mistake. The model has no way to produce the recognition, and accepting the patch without it means shipping the regression and learning nothing in the process.

Note

None of this is saying the agent is useless. Its reliable surface is pattern-matching across volume, the way grep is reliably better than reading the whole file when you already know what string you’re looking for. Surfacing every place a deprecated API is called across a million-line repo. Pulling the regex syntax you’d otherwise have to look up. Flagging the four files in a 200-file diff that touched the auth path. The agent is a faster grep against language, and on that narrow ground it earns its seat. What is being sold and billed for, though, is autonomous production, and the autonomous-production claim does not survive the METR result above. The agent is nowhere near human decision-making, and the cost of treating its output as if it were is exactly the gap between the perceived 20% speedup and the measured 19% slowdown.

The slow-onset failure

The damage falls hardest on engineers who came up after the tools landed. The current cohort of senior engineers built their judgment in a decade when the slow work was the only available path. Every recursive query was a recursive query they had to figure out. Every migration was one they had to plan. Every 2 a.m. incident was one they had to root-cause without a model offering a first-guess hypothesis (Alert Triage Without an Agent goes deeper on that specific muscle). The path that produced today’s seniors ran straight through the slow work the agent now does on demand.

Juniors who let the agent do that work will not arrive at the same place by the same route. Three years of accepting every agent PR, and the engineer who used to be a junior in their codebase is still a junior in their codebase, except now the codebase has grown more complex and the parts they don’t understand have grown faster than the parts they do. The gap doesn’t show on day one, or month six, or even year two. It shows the first time the agent produces output the engineer cannot evaluate: when the question a senior would ask about a migration is one the junior doesn’t know to ask, or when the bug in the agent’s PR is invisible to anyone who hasn’t written the broken version themselves (see also What AI Gets Wrong About Your Database for the database-specific shape of this).

Warning

By the time the gap shows, it has been compounding for years. The engineer is on the wrong side of a hiring market that pays for exactly the recognition they no longer have, and nothing in a quarterly performance review catches the deposit you didn’t make to your own long-term memory.

The calibration

The skill worth building is knowing which work the agent should do, which work you should do by hand, and which work you should accept from the agent and then rewrite anyway to internalize the pattern.

The agent is the right tool for work where the context you’d gain by doing it yourself is marginal. Boilerplate. Syntax you’d otherwise look up. Test scaffolding for code paths you already understand. The migration template you’ve written for the tenth time this year. The fifty-line helper that’s mechanically obvious once you’ve decided what it should do. Let the agent handle these with a brief review and move on.

The agent is the wrong tool for work where the context is the asset. The parts of the system you don’t yet understand. New code paths through a critical module. Database changes whose consequences you’d want to feel in your fingers before approving them in production. The first recursive CTE against a tree-shaped table you’ve never queried before. The first incident in a class of failures you haven’t seen, where the agent’s hypothesis is a hypothesis you should also be forming yourself. Do this work by hand, even when the agent would have produced a working diff faster. The slow version is what builds the alarm that catches the agent’s mistake the next time the same shape of work shows up.

The hard part is the middle. Work that’s neither pure boilerplate nor entirely novel. Some of it belongs in a tight loop where you drive and the agent assists on syntax. Some gets reviewed line by line as a learning exercise rather than a compliance step. The rest gets rewritten by hand after the agent produces a working version, just to deposit the pattern in your own muscle. The choice turns on whether the work sits in a part of the codebase you need to know deeply or one you can afford to treat as a black box.

When this doesn’t apply

The argument cuts cleanest for engineers building depth in a domain they intend to stay in. A platform engineer who needs to know the database. A security engineer building the recognition Cloudflare’s example demands. A backend engineer whose career bet is on a specific stack. A frontend engineer whose framework just shipped thirteen advisories in a coordinated security release (auth bypass, SSRF, i18n path bypass, an RSC DoS hitting every App Router deployment on 13.x through 16.x) and who needs to read their own dependency graph well enough to know whether they were exposed. For these engineers, the slow work is the investment that pays back over the next decade.

It cuts less cleanly for work that doesn’t depend on depth. The hobbyist exploring a new language. The throwaway script that ships in an afternoon and dies in a week (The 10x Is Real, on Internal Tools You’d Otherwise Never Ship covers that end of the spectrum). The pre-product-market-fit startup whose entire codebase is throwaway in expected value, where vibe-coding the MVP and finding out if anyone wants the product is the rational trade against the dominant risk of nobody wanting it. The bill on that last case comes if the product wins, in the form of hiring engineers who can read the agent’s output and untangle the parts that now have to scale. That is a problem to have.

It also doesn’t apply where the agent’s baseline beats what the company can actually hire at its price point. The frontier model is mediocre in absolute terms (METR again), but it is a consistent floor, and not every company can outhire that floor at the salaries they actually pay. In those shops the cheaper path is to let the model produce and have a senior reviewer (often a contractor) clean up after it. The agent there is competitive at the level the company can afford, which sits below senior judgment but above the median hire the budget will permit.

Senior engineers who already have the context sit outside the trap entirely. The one who has written the recursive CTE a dozen times can accept the agent’s first-draft query and review it competently because the alarm is already wired. The asymmetry is that the trap falls hardest on the engineers least equipped to recognize it.

The bigger picture

The market for engineering judgment is splitting. Work the model can do at the level of a competent mid-level engineer is being commoditized; work that requires the judgment to recognize when the model is wrong is being concentrated. Which side an engineer ends up on is determined less by the tools they use than by which work they choose to do by hand.

The senior’s value is going up because the volume of model output needing adult supervision grew faster than the supply of adults to supervise it. The junior’s floor is the level the model now hits without help. The path from one to the other used to be the slow work, and the path is still the slow work, except the slow work is now optional and most engineers will not opt in.