The most dangerous data problems don’t trigger alerts or cause catastrophic failures. They look fine on the surface until the business realizes the damage they’ve done. AI-assisted development has increased in popularity, however, and this approach is only making the problem worse.
Consider a dashboard built for workforce capacity planning. The person building the tool knows that its business purpose is to provide an understanding of permanent hiring targets, so the “headcount” field is for permanent employees. Later, the team decides to add contractor tracking, expanding the reporting scope. This makes sense and passes review. But when they decide to make a subsequent change to include contractors in “headcount”, this changes the meaning of the field so that it undermines the data product’s original intent—and nobody realizes it.
This subtle failure mode, in which a specific metric built to represent one thing gradually morphs into something adjacent or even entirely different. Known as semantic drift, and with roots in linguistics, this is similar to how language evolves over time. Take the words “awful” and “awesome,” which used to mean the same thing: “awe-inspiring.” However, after centuries of use, they have completely diverged and now do different jobs.
(everything possible/Shutterstock)
AI-assisted development has accelerated the accumulation of changes that lead to semantic drift, especially over shorter time scales. We’ve all been there, using a word with complete confidence, only to learn that its meaning has changed. When it happens in casual conversation, it’s embarrassing. But when the same thing happens to a business metric, it tends to be expensive.
In the workforce capacity planning example, each change made sense in isolation, and everything appeared to be in working order. Pipelines still ran, the dashboards loaded, tests passed, but the data product lost coherence and with it the business value. This leaves executives to make business-critical workforce decisions based on a metric they think means one thing, but now reports on another.
Gartner’s 2025 analysis of data management leaders predicts that through 2026, 60% of AI projects unsupported by AI-ready data will be abandoned. The obvious, visible failure modes look like broken code or pipelines and are easy for data teams to identify. However, the acceleration of semantic drift suggests that numerous other projects, even those that are leveraging AI-ready data, have yet to be abandoned but should be.
The software industry is already responding to this problem by moving towards spec-driven development, a methodology that emphasizes creating clear, structured specifications before writing any code. By making sure to capture the intent of what is being built before building it, spec-driven development ensures every subsequent change can be evaluated against the full history of requirements. It’s not a new idea in software, but AI has made it more urgent because without this methodology borrowed from software engineering, AI-enhanced data development can undermine the very products it produces.
Why Time is of the Essence for Spec-Driven Development
Documentation has always been a major point of friction in data. Teams would finish building something, then scramble to write down what they’d built. Post-hoc documentation was often incomplete or inadequate, but it was often the best an organization could hope for. If more details were needed about why a data product was built a certain way, the business would track down a senior data engineer and ask. That model clearly doesn’t scale as data products continue to grow in volume and importance.
DataOps methodology, drawing heavily from software’s DevOps practices, pulled documentation backward, into the workflow in parallel with development. But even with a good DataOps approach, an audit trail only records what was built, not why it was built, leaving room for semantic drift when changes stack up at AI speed.
To build drift-proof data products in the age of agentic AI, data engineers must borrow another software development model that moves documentation further left: Spec-driven development. This approach captures intent before the build even begins.
By identifying intent and using the workforce capacity planning example above, a spec would define headcount as permanent employees only for capacity planning against permanent hiring targets. When a subsequent request arrives to add contractor resource-utilization tracking, an AI agent working from that spec would see that incorporating contractors would change what the headcount figure measures, and flag it. By surfacing the conflict and the need for human intervention, it would require a review before anything reaches production.
The same principle applies across any kind of data product. When you give AI the full history of requirements alongside the current state, it doesn’t just ask “what’s the best way to implement this?” It asks “what’s the best implementation that satisfies all requirements, including this new one?” In doing so, it ensures the original intent is always preserved.
For the spec to be effective, it only needs to capture what a data product is for, what each field means, why it was built the way it was, and what constraints have been placed on it over time. This enables the organization to easily make good decisions down the road while providing the context that agentic AI is almost always missing.
What This Means for Data Leaders
Spec-driven development has always been the right approach. The reason it didn’t gain broader traction was purely economic. Building and maintaining specs manually is time-consuming and costly; as a result, most teams opted out.
The good news is that agentic AI can solve the problem it’s creating; if changes are cheap now, so is spec maintenance. With a complete record of what a data product is supposed to mean, and why, an AI agent can evaluate every change against the original goals to maintain coherence and update the spec as part of the same workflow. An AI agent working from a complete spec isn’t limited to responding to change requests, either. It can evaluate data products against their own requirements on a schedule. That future isn’t far off, but it only works if the spec is there to begin with.
About the Author:Guy Adams is the co-founder and chief technology officer at DataOps.live. DataOps.live is a DataOps automation platform for Snowflake, delivering AI-ready data products faster by automating CI/CD, providing continuous observability, and enforcing governance controls across the full data delivery lifecycle for companies like Eutelsat, Snowflake, and AstraZeneca. Guy is also the co-founder of TheTrueDataOps.org movement. He has spent 20+ years leading software development organizations. In his current role, he brings the principles and business value from DevOps and CI/CD to data.
If you want to read more stories like this and stay ahead of the curve in data and AI, subscribe to BigDataWire and follow us on LinkedIn. We deliver the insights, reporting, and breakthroughs that define the next era of technology.
The post Spec-Driven Development: The Key to Protecting AI-Generated Data Products appeared first on BigDATAwire.
Go to Source
Author: Guy Adams

