Back in 2016 I bought a new car, and one afternoon on the motorway it did something that made the hair on my arms stand up.
I was driving home, traffic was light, and I clicked the thing on. And the car just… drove. It held the lane on its own. Gentle little corrections, left, right, following the curve of the road like it had done it a thousand times. My hands were still on the wheel — the car insisted on that, it would beep at me and sulk if I let go for too long — but they weren’t doing anything. They were just resting there. Almost useless.
I had been driving for twenty years at that point. Two decades of the thing being mine to do — the steering, the watching, the tiny constant negotiation between me and the road. And here was a machine, in my own car, doing it while I sat there like a passenger in my own life. It was wonderful. It was also a little bit terrifying, in the way that wonderful new things often are. I remember thinking: this is it, the floor shifted under me, and it didn’t even make a sound.
I didn’t know it back then, but I had just met Level 2.
There is a ladder, and engineers love a good ladder
Turns out there is official scale for exactly how much the car drives itself instead of you. Engineers, bless them, could not let a thing like “self-driving” stay vague — so there is a standard, SAE J3016, and it splits the whole journey from “you do everything” to “the car does everything” into six neat rungs. Level 0 to Level 5.
Something has been rattling around my head for months now, and it is the reason I am writing this at all. That same ladder maps, almost rung for rung, onto where our own craft is heading — onto how we write software. Not as a loose metaphor I am stretching to fill a blog post. As the same shape. The same story, told twice, once on the road and once in code.
Let me be honest up front, because I can already hear the smart guy in the back row sharpening his knife: no analogy is a law of physics. Fine. But it holds for the part that matters, and it holds curiously well. So bear with me — and keep one eye open, because two of these six rungs don’t behave the way you’d expect, and those two are the whole story. Let’s climb the ladder, both sides at once.
Level 0 — you, and a machine that only nags
The car: no automation. The car does nothing for you except, maybe, beep. Beep when you drift, beep when something’s in your blind spot, beep when you’re about to reverse into a pole. It warns. It does not act. You drive every meter.
The code: the compiler yelling at you. The yellow squiggle. CodeCop wagging its finger — you called that procedure without the parentheses, you absolute walnut — or grumbling that your variables aren’t declared in the right order, or some rule fuming that you left a space where no space should live. The machine is watching, and it has a ton of opinions, but the driving — every keystroke of the actual logic — is hundred percent yours. We lived here for a very long time, and most of us were perfectly happy. A squiggle is helpful. But a squiggle never grabbed the whole keyboard.
Level 1 — the machine takes one hand
The car: driver assistance. Now the car handles one thing for you. Either it keeps your speed (adaptive cruise — it watches the car ahead and follows its pace), or it nudges your steering to keep you in lane. One or the other. Never both at once. You are still very much the driver; you’ve just delegated a single muscle.
The code: plain old autocomplete. IntelliSense, if you want. You start typing a variable name and the editor offers to finish it. It knows the fields on your table, procedures on your codeunit, it closes your parentheses. One narrow thing, done for you, so you can keep your attention on the part that’s actually hard. Nobody panicked about IntelliSense. It was just… nice. That streak, the one where the machine helps but nothing really changes much, was about to end.
Level 2 — both hands, but you’re still responsible
The car: partial automation. Now the car does speed and steering together. My little 2016 epiphany lived right here: the car following the lane and pacing the traffic at the same time, the whole motorway choreography handled for me. But I was still the driver. That’s the entire point of Level 2. My hands on the wheel. My eyes on the road. If it got confused, it was my neck. Literally. The machine drove; the responsibility stayed mine. (Tesla’s Autopilot lives here too, for what it’s worth, whatever the name suggests.)
The code: AI autocomplete. The leap from “finish this word” to “finish this whole function.” When GitHub Copilot went generally available in June 2022, that was exactly it — machine doing speed and steering, writing whole blocks while you supervised, and you reviewing every single line because if it hallucinated some nonsense, it was your commit, your neck. The machine wrote; the responsibility stayed yours.
But here the story is about to get strange.
Level 3 — the rung that taught us the most
The car: conditional automation. Now the car genuinely drives — within its domain, on a specific kind of road, in specific conditions — and you are, finally, off the hook for the actual driving. You can relax. A little. But — and it’s a big but — you must stay ready to grab the wheel the instant car taps you on the shoulder and says “your turn, I’m out of my depth.” Not asleep. Not lost in a book. Just… ready, the whole time, for a tap that may never come. Responsibility ping-pongs back and forth like a hot potato — and that handoff, it turns out, is the single most dangerous idea in the whole ladder.
So the serious carmakers mostly looked at Level 3 and flinched. Honda put out one in Japan in 2021 — exactly one hundred of them, lease only. Mercedes shipped one too (and BMW) — geofenced, capped at a crawl, tiny numbers — and then quietly hit pause on the whole thing in early 2026. That’s basically it. And not because they couldn’t build it. Oh, absolutely not. They pulled out of it because they pictured a human dozing through ninety-nine percent of the drive and then being asked to save the day in two seconds flat — and they decided that handoff was a death trap. The lesson of Level 3 in cars was brutal and simple: this middle is poison. Either keep your hands on, or take the human out completely.
The code: here the car and the code stop rhyming and start teaching. Because in our world, Level 3 didn’t get skipped at all — it shipped. Loudly. We called it vibe coding. “Just let the AI write the whole feature, I’ll loosely keep an eye on it” — that is conditional automation, word for word. The machine drives, you supervise with half attention, you grab the wheel when it veers. And how did it go? It face-planted. Spectacularly. The thing produced mountains of plausible nonsense, you stopped reading it closely (because that is exactly what loose supervision means) and the bugs walked straight into production wearing confident smile. Nobody serious vibe-codes anymore.
And Level 3 — mark this — was the most useful rung of them all. It was useful precisely because it failed, and we learned more from how it failed than from every comfortable rung that came before it. It taught us, the hard way, the exact same thing the carmakers learned: the loose-supervision middle is a trap. Either keep your hands properly on the wheel (Level 2) or do the real work to take them off for good (Level 4). The hot potato is the one place you cannot stand.
Level 4 — the empty driver’s seat
The car: high automation. The car drives. Fully. No human needed — and I mean the seat is empty. But, the seat can stay empty only inside a defined area. A geofence. A patch of the city that’s been mapped down to the curb, where conditions are known and the operator did their homework first. Inside that box: no driver, no hands, no human at all. Step outside the box: nothing.
And notice what you can’t do. You cannot buy a Level 4 car. There is no Level 4 in your garage. It only exists as a fleet — Waymo, Zoox, Baidu’s Apollo Go, WeRide, Pony.ai — run by an operator who controls the whole environment it drives in. The autonomy is real. It’s just not yours, and it only works where someone tamed the jungle first.
The code: the agent. Real agentic coding, the thing that genuinely takes a task and does it, writes the code, runs the tests, fixes its own mess, comes back when it’s done. And most people get this part wrong: it works brilliantly, reliably, almost like magic — inside a geofence. And the geofence is not the tooling, and it is not a vibe. It is the domain you draw around the agent: which tasks it is allowed to take on, which modules and bounded contexts it may touch and which it must never, what tools and permissions it holds, what it is allowed to change out in the real world, what “done” actually means, and the exact line where it has to stop and come ask you. That boundary is the geofence. You draw it first — before anything else — or nothing else you do will matter.
And then, inside that domain, you build the thing that makes letting go safe enough to actually do it: the scaffolding around the agent. That scaffolding is the whole heart of Level 4 — the skills you give it, the subagents you split the work across, the guardrails and the rules and the conventions it has to follow. It is the part nobody can do for you. Three pieces of that scaffolding are not optional — think of them as the road, the sensors, the safety systems inside the domain:
- Test code. Not twenty percent covered, not a comfortable sixty — every codeunit the agent is allowed to touch, covered to the last line. One hundred percent, inside the fence. Every line you leave uncovered is a stretch of road with no markings — the agent sails straight through it and never once slows down.
- Code the agent can actually reason about. Small, well-structured codeunits with one job each — not a ten-thousand-line monster with globals everywhere and the logic smeared all over the place. Spaghetti drowns an agent exactly the way it drowns a junior, only faster.
- A tight agentic feedback loop. The agent has to make a change, run the tests itself, and see green or red in seconds — right next to the code, instant. Not a CI pipeline that crawls back with an answer ninety minutes later. And think for a second what ninety minutes even means to a thing that reasons in seconds: it’s like shouting “brake!” at a driver who has already crossed three towns by the time the word reaches his ears — and is happily flooring it through the fourth. CI is far too slow for the way these things work now; by the time the build limps back red, the agent is a hundred wrong turns down the road, cheerfully building on top of the wreck. The loop has to be fast enough to live inside the agent’s own thinking. But it is not just tests — and please, please pay attention to this. Automated tests we’ve had for decades; that is old news. What we never had, not once, is a loop that can judge. Point a second agent at the diff and ask it the things you used to need a senior leaning over your shoulder for. Is this secure enough? Could it leak a secret? Any obvious performance traps in here? Does it use our terminology, our patterns, the names we actually agreed on? Stand up a maintainability reviewer: how much is this going to hurt to live with in two years? A UX reviewer: how easy is this actually to use? A complexity reviewer that flags the one codeunit nobody will ever dare open again. None of those are true-or-false — they are fuzzy, probabilistic, human judgments, the exact kind of thing we could never put on autopilot. Until now. That is the agentic in the agentic feedback loop, and it is worth more than every green checkmark combined.
Draw the domain, build those three inside it, make it all tight — and the agent drives the whole way with the seat empty. Leave it loose, point the thing at a legacy code swamp — no boundary, no tests, spaghetti everywhere, nothing to tell it when it has gone wrong — and it wanders straight into a ditch. Same agent. Different jungle. And the jungle is on you.
Level 5 — the thing that doesn’t exist
The car: full automation. Drives anywhere, any conditions, any road, no limits, no geofence, no homework. Snow, gravel, a chaotic market street in a city nobody mapped — doesn’t matter. You could rip the steering wheel out entirely.
It does not exist. Not in production, not anywhere, some say “not ever,” but more on that one later. Every honest engineer in the car industry will tell you Level 5 is the horizon of a sort — and the horizon has this annoying habit of moving away as you walk toward it.
The code: the mirror image. “Give the machine any vague request, point it at any codebase on Earth, no setup, no scaffolding, and it ships the right thing.” That’s Level 5 coding. Same horizon. Some say we’ll never get there. I don’t, but who cares.
It was never the model
So look at what the ladder actually tells us, in both columns, when you stand back.
The autonomy that just comes in a box, that works everywhere with no effort from you — that one stalls out at Level 2. Always did. Level 3 was the one doomed attempt to cheat past it — Honda’s hundred cars, our whole vibe-coding fever — and you just saw how both of those ended.
But the high autonomy, the empty seat, the agent that drives the whole way? That one is absolutely real — and it has one condition the model alone never satisfies. Now, let’s be fair: the robotaxi is a smarter machine than my 2016 car ever was — better sensors, better brain, no argument from me. The model matters; it has to be good enough first. But those smarts buys you exactly nothing until someone maps the city, fences the domain, defines the limits of how far it’s safe to let the thing drive on its own. Capability gets you to the start line. Control is what lets you actually let go of the wheel. The robotaxi works because the city is mapped. The agent works because the codebase is fenced. Same law.
And that flips the whole thing from a scary story into a hopeful one — which is why I am genuinely excited. Because the level of autonomy you get to deploy is not handed down to you by the gods of the model — you don’t unlock a higher rung by waiting for some lab in San Francisco to ship a smarter one. It’s a dial, and your hand is on it. The more of the jungle you are willing to tame — the domain you draw, and the whole scaffolding you build inside it: the skills, the subagents, the guardrails, the test codeunits, the feedback loop — the higher up that ladder you get to safely go.
Now — is your hand the only hand on it? No, it definitely isn’t. Some jungles are just too big for one person to clear. Mapping a whole city to the curb is a billion-dollar fleet job, not a weekend — and some legacy codebases are exactly that kind of city — a sprawling untested megalopolis you alone are not going to fence by Friday. That’s real, and it’s why the empty-seat car is a fleet and not a thing in your garage. Fair. But most of what you and I actually touch in a day is not a whole city. It’s a neighborhood. And a neighborhood you can map.
Want proof it’s about control and not magic? Look at the buses. Some of the most reliably driverless public transport running right now isn’t some go-anywhere robocar — it’s a shuttle on a fixed route, the same loop every single time, the environment nailed down so completely that full autonomy becomes almost easy. Maximum control, maximum autonomy. The route is the geofence. That’s the recipe at Level 4, and it works.
So when your agent face-plants, the wrong question to ask is this: “is the model good enough yet.” Ask instead: how much of my jungle have I actually cleared? Where are my tests? Is my code clean, or is it a swamp? Where’s my feedback loop? And yes — let’s be honest about the other case, because sometimes you did everything right and it still fails. Hundred percent coverage, codeunits a child could follow, a feedback loop that snaps back in two seconds — and the thing still drives straight into the lake. It happens, it’s maddening. But — and I’d bet my morning coffee on this — count how often that’s the actual diagnosis, against how often the honest answer is the uglier one: the agent didn’t fail at all. It was you who just sent it away and forgot to build the road.
Nine years later…
In 2016 my car drove the motorway for me and made my arm hair stand up, but it still needed my hands. It still needed me, sitting there, ready. That was the deal, and it was already a small miracle.
Today, right now, in many cities, you can open an app on your phone, and a car pulls up to the curb, and you get in, and there is nobody in the front seat. Nobody. It takes you across the city — the known streets, not the freeways yet; even the miracle has a geofence, and they pulled it back to the streets it knows best — and it does the whole thing itself. The empty seat I couldn’t quite imagine on that motorway nine years ago is just there. Today.
It didn’t arrive because the cars got infinitely smart. It arrived because somebody mapped the cities, fenced the domains, and built the world the autonomy needed in order to be trusted. The empty seat is real — it just lives where someone did the work to make it possible.
Same as your codebase. Exactly the same.
The road ahead is the one we build for ourselves, rung by rung, fence by fence — and the people who learn to build real scaffolding around their agents are the ones who will travel it furthest.
What kind of road are you building — and how empty are you brave enough to let that driver’s seat get?
