Isaac Martin

Here is a prediction I hold with more conviction than I'm fully comfortable with: within a decade, a large share of the software running in production will be code that no human has read, written by machines in forms no human would tolerate writing by hand.

I don't mean this as hype. I mean it as a straightforward extrapolation of a few trends that are already underway, plus one shift in how we decide that software is correct. Let me lay it out, including where I think it falls short.

The contract is the interface, not the source

When I review a pull request today, I'm doing two things at once. I'm checking that the code does the right thing, and I'm checking that the code is written in a way I can maintain — that the next person can read it, reason about it, and change it safely.

The second concern only exists because humans maintain code. Strip that assumption away and most of what we argue about in review — naming, structure, idiom, cleverness-versus-clarity — evaporates. What's left is the first concern: does this thing satisfy its contract?

A function's contract is observable from the outside:

Inputs and outputs. Given these arguments, it returns that result.
Properties and invariants. It's deterministic, or it's idempotent, or it never allocates beyond a bound, or it's monotonic in some argument.
Computational telemetry. It runs in this much time, uses this much memory, makes these syscalls, touches these resources.

If I can specify all of that precisely, and verify it exhaustively enough, then the source code inside the box becomes an implementation detail. I don't review the code. I review the evidence that the code honors its contract — and then I compose larger systems out of components I trust at their boundaries.

This is not a new idea. It's just black-box testing taken to its logical conclusion. You stop inspecting the internals and start specifying the behavior you actually care about.

Why the machine writes code you'd never write

Once a human stops being the maintainer, the cost function for how code is written changes completely.

We avoid hand-written assembly, aggressive bit-packing, and exotic cache-aware layouts for good reasons: they're slow to write, brutal to debug, and easy to get subtly wrong. The risk and the labor rarely justify the speedup. So we reach for a higher-level language, accept the overhead, and move on. That trade-off is correct — for humans.

A machine that can generate a candidate implementation, test it against a comprehensive contract, measure its telemetry, and throw it away if it fails, has a different trade-off entirely. It doesn't get tired. It doesn't fear the bug, because the verification harness catches the bug. It can explore implementations that are too tedious or too treacherous for any sane engineer to attempt by hand:

Specialize a function to the exact distribution of inputs it sees in production.
Hand-roll SIMD or assembly tuned to one target architecture.
Collapse layers of abstraction that exist only for human comprehension.
Trade memory for time, or time for memory, in ways no general-purpose compiler would risk.

The constraint that has always governed extreme optimization isn't the machine's patience. It's ours.

The Roller Coaster Tycoon ceiling

There's a reference point I keep coming back to. Chris Sawyer wrote the original Roller Coaster Tycoon almost entirely in x86 assembly. The result ran smoothly on hardware that, by the standards of the games shipping alongside it, it had no business running on. It is one of the most-cited examples of what's possible when a brilliant engineer is willing to pay an enormous price in effort for efficiency most people would never attempt.

We treat that as a legend precisely because it's so rare. The effort was extraordinary and almost nobody can repeat it.

Now imagine the effort cost drops toward zero. Imagine Roller Coaster Tycoon-grade optimization isn't a once-in-a-generation feat but the default output of a system that writes a contract-satisfying implementation, measures it, and keeps tuning. The legendary becomes the baseline. That's the world I think we're heading toward — not because the machine is smarter than Sawyer, but because it doesn't experience the labor as cost.

What this actually buys us

The honest framing isn't "AI replaces programmers." It's that the layer at which humans work moves up, and a layer of efficiency that was previously uneconomical becomes free.

The leverage is real:

Efficiency we currently leave on the table. Most software is written for maintainability, not speed, and that's the right call today. When speed stops costing human effort, a lot of latent performance gets unlocked.
Composition over implementation. Humans spend their attention on system design — choosing the components, defining their contracts, wiring them together — rather than on the internals.
Specialization at scale. Code tuned to real workloads, regenerated as workloads shift, instead of one general implementation that's mediocre everywhere.

This is the same pattern I keep landing on with AI generally: the value isn't a chatbot bolted onto the side of a product, it's AI embedded in a real workflow where it does work that was previously too expensive to do.

Where the model breaks down

I'd be writing marketing copy, not an honest forecast, if I stopped there. This future has hard edges.

Your contract is only as good as your imagination. Black-box verification proves the implementation does what you specified — not what you meant. Every untested input, every unstated invariant, every implicit assumption about the environment is a place where a confidently-passing function does the wrong thing in production. We already feel this with evals: writing the spec is the hard part, and it's where the real engineering judgment lives.

Telemetry hides as much as it reveals. "Passes all tests, fast, within memory budget" tells you nothing about whether the implementation is secure, whether it has a timing side-channel, or whether it fails pathologically just outside your test distribution. Some properties are easy to verify by observation. Many of the ones that matter most are not.

Unreadable code is unmaintainable by humans, by definition. The moment the verification harness can't explain a failure — or the contract itself needs to change in a way the machine can't infer — you're staring at an artifact no person can read. The maintainability we gave up doesn't vanish. It relocates, into the specs, the harnesses, and the regeneration pipeline. Those become the artifacts that have to be legible, reviewed, and owned.

Debugging moves up a level. When something goes wrong in a composed system of opaque components, you debug at the seams — the contracts and their interactions — not inside the boxes. That's a different skill, and we're not especially good at it yet.

Today's models may not get us there. Current LLMs are trained on a corpus of human-written code, and humans don't write the kind of radical, machine-tuned assembly I'm describing — so there are essentially no examples for a model to learn from. A system that imitates what people have written is not obviously a system that can invent optimizations no person has ever attempted. Getting there may require a paradigm shift in how these models work or how they're trained, or a serious leap in our ability to generate synthetic training data from verification harnesses themselves. I think the destination is right; I genuinely can't tell you when — or by what mechanism — we arrive.

So the human in the loop doesn't disappear. The human moves to where the leverage and the risk concentrate: specifying contracts, designing verification, and owning the composition. That's a more demanding job, not a smaller one.

The bottom line

The trajectory I find most plausible isn't machines replacing engineers. It's a redefinition of what we review. We stop reviewing source and start reviewing evidence — inputs, outputs, invariants, telemetry — and we compose systems from components we trust at their boundaries. In that world, the implementations get stranger, faster, and more specialized than anything we'd write by hand, and Roller Coaster Tycoon-class optimization becomes unremarkable.

The skill that appreciates in value isn't writing code. It's the ability to say precisely what correct means — and to build the verification that makes "I never read it" a defensible thing to say.

If your team stopped reading the implementations tomorrow, how confident are you that your contracts actually capture what you meant? That gap is the work.