AI still needs humans

The backlash was inevitable. For the past year, Silicon Valley has been telling us that software development is on the verge of becoming a prompt-and-ship exercise. You know, just describe what you want and let an AI coding agent build it. Sure, maybe you could keep a few token senior engineers around to bless the output…or maybe not. I mean, Google’s Sundar Pichai says 75% of its new code is now AI-generated and reviewed by engineers, up sharply from earlier levels.

Hurray! Right??? Well…

The Wall Street Journal recently highlighted warnings from Mario Zechner and Armin Ronacher, two engineers behind core pieces of the popular OpenClaw AI agent, who argue that AI coding tools are flooding software with what they call “vibe slop.” Their complaint is that too many people are using AI to skip the parts of software development that actually matter: design, judgment, testing, ownership, and deep understanding of the system being changed.

This is worth taking seriously. When people who helped build the tools used by millions start warning that those same tools can produce buggy, potentially dangerous software at industrial scale, it’s probably time to rethink some of the assumptions fueling the AI wave.

Rethink, not reject.

The right answer isn’t “AI coding is bad.” That’s silly. AI coding is powerful in roughly the same way power tools are powerful. They help skilled people do more, faster. They also help unskilled or careless people make bigger mistakes with greater confidence. That’s the enterprise AI story in miniature.

Nearly correct is still very wrong

I’ve made a related argument about the real cost of “nearly correct” AI code. The trouble was never that large language models could produce obviously broken garbage. If they did, we’d catch it and move on. The trouble is that they very quickly produce plausible output. Fast and plausible is exactly the kind of wrong that slips into production.

It’s important to realize that generating code has never been the hard part of software. As Honeycomb Founder and CTO Charity Majors puts it, being a great software engineer “has far more to do with your ability to understand, maintain, explain, and manage a large body of software in production over time, as well as the ability to translate business needs into technical implementation” than to simply churn out lots of code. As I’ve written before, speed of development is rarely the right metric. Developers spend much of their time understanding existing systems, not simply adding lines to them.

AI hasn’t eliminated the need for that hard work. What it has done is make it easier to foolishly skip it.

That’s true beyond software, too. I use AI constantly in my work. I’ll use AI to rough out slides we use to train sales teams, for example, or to synthesize feedback from customers. AI gives me a starting point, like a first draft on a memo that may be 80% correct. That’s a real gift. But a final draft that’s only 80% right is a liability, so I have to coach and oversee the agents. It’s real work, albeit different work from what I’d done before.

The problem is abdication

The dumbest version of the AI coding debate asks whether AI will replace developers. The better question is what kind of developer does AI reward? It doesn’t reward the person who blindly accepts output. Instead, it rewards the person who can tell, quickly and accurately, whether the output fits the system, the security model, the performance envelope, the user need, and the organization’s standards. In other words, AI rewards experience; it rewards people who know what “good” looks like.

This is why fleets of autonomous coding agents make me nervous. Not because agents can’t be useful, but because responsibility doesn’t scale the way prompts do. A developer can review one AI-generated change. Maybe five. Maybe 20 if the changes are small and the tests are strong. But when a company starts celebrating dozens or hundreds of agents churning out pull requests, issues, tests, migrations, and fixes, the obvious question is: Who actually understands what’s happening?

If the answer is “another agent,” I’m sorry but we’re back where we started. Open source maintainers are already living with the downside. GitHub has been weighing tighter pull request controls after maintainers warned that a surge of low-quality, often AI-generated contributions are overwhelming projects. InfoWorld reported that GitHub has considered stronger filters and maintainer controls to stem the flood.

This is the ugly economics of AI slop. It’s cheap to generate but expensive to review.

Friction is the point

Ronacher has been making a related point with admirable clarity. In his talk, “The Friction Is Your Judgment,” he and Cristina Poncela argue that agent-generated code has a way of drifting toward the locally convenient answer. Catch the exception, add a fallback, paper over the weird edge case, keep the demo moving. Each change can look reasonable in isolation, but the problem is what happens after a hundred of them accrete across the codebase, quietly making the system harder to reason about.

That sounds right to me. Friction isn’t an enemy; rather, it’s where your judgment lives.

This is why the “human in the loop” language, tired as it has become, still matters. But the phrase only means anything if the human is both paying attention and capable of judging the work. A junior developer accepting generated code because it passes the first test doesn’t solve the problem. Nor does a senior developer “reviewing” a flood of agent-written pull requests at a speed that makes real review impossible.

The safeguard is not a person vaguely near the loop. No, it’s expertise applied deliberately, with systems that force accountability rather than assume it. For developers, AI is strongest when it’s used for bounded tasks like generating tests or explaining unfamiliar code. In the same way, it’s weaker when asked to make broad architectural decisions or infer business rules that live in people’s heads rather than in the repository.

For managers, the worst possible metric is “percentage of code generated by AI.” That’s like measuring a newsroom by the percentage of sentences drafted by autocomplete. Who cares? The real questions are whether defects are down, delivery is faster, incidents are fewer, and customers are happier.

The 2025 DORA report on the state of AI-assisted software development gets at this more usefully: AI tends to amplify an organization’s existing strengths and weaknesses. If you have strong tests, clear ownership, disciplined review, good observability, and fast rollback, AI can make you better. If you have weak engineering hygiene, AI can make you worse faster.

In other words, AI doesn’t eliminate the need for engineering discipline. It raises the price of not having it.

Guardrails can’t be a memo

Discipline is necessary, but for an enterprise, it isn’t sufficient. You cannot make tens of thousands of engineers, analysts, marketers, lawyers, and salespeople reliably “slow down and check the work” through good intentions and a memo. At scale, keeping a human in the loop has to be enforced by architecture, not good intentions.

In practice this means baking guardrails into the systems agents touch, like identity, data governance, and observability. This is where I’ll risk sounding like I work where I work (Oracle). The genuinely interesting shift I see across the industry, and yes, where Oracle is placing its bet, is pushing more of those controls down into the data layer itself, so agents operate against governed enterprise data rather than as clever scripts holding the keys to production.

That’s not as exciting as saying agents will write all your code but guess what? That’s good. In enterprise AI, “boring” is good.

So how much should it matter to enterprises that Google says 75% of their new code comes from AI? It may well be true, but Google also has some of the best engineers in the world reviewing that output. That’s the part of the story too many AI boosters skip but shouldn’t. Humans are the best way to make AI work.

Go to Source

Author: