Building Agents? Stop Treating messages[] Like a Database
Stop using messages as your agent's memory. Learn how structured state makes AI agents more reliable, efficient, and production-ready.

The central question in how we build and ship software has almost always been some version of “How do we go faster?”
Faster sprints. Faster releases. Faster feedback loops. The entire discipline of modern software delivery has been oriented around compressing the time between idea and execution. Agentic AI now answers that question, but in doing so, it also asks a much harder one.
When code, tests, and workflows can be generated in hours instead of weeks, execution stops being the constraint. What remains—alignment, governance, validation, and the ability to connect delivery to real business outcomes—are the parts of the organization that most companies have always quietly struggled with, but never had to treat with urgency. Now they do.
This is playing out across delivery projects right now: the technology is working, but organizations are not keeping up.
In traditional delivery, teams could absorb a certain amount of ambiguity. Unclear priorities, loosely defined success criteria, too many stakeholders with competing views… These things slowed delivery down, but the slowness itself created a buffer. By the time a team had built something, there had usually been enough time for alignment to catch up.
Agentic systems eliminate that buffer.
When a team can build a working onboarding flow in days rather than weeks, the cost of misalignment doesn't disappear, it compounds. What used to be a two-week misalignment becomes a two-day one, which means the rework arrives faster, the misdirected effort scales faster, and the organizational debt accumulates faster.
I've started describing it this way to clients: AI is the accelerator, but alignment is the steering wheel. At low speeds, being slightly off course is manageable. At high speeds, being off by a few degrees takes you somewhere completely different.
Most organizations, when they're honest about it, don't have great steering. They have, at best, consensus mistaken for clarity (and even consensus can be hard to come by). Decisions made without clear ownership. Priorities that are debated rather than decided and data that exists, but isn't trusted enough to actually guide direction.
When delivery was the bottleneck, those weaknesses stayed mostly contained. Now they're the bottleneck— and they scale.
For decades, the project management triangle has been the organizing logic of software delivery. Time, scope, budget: pick one, maybe two. Teams have built entire planning practices around this tension, learning to negotiate trade-offs, manage stakeholder expectations, and make peace with what's feasible given the constraints in front of them.
Agentic systems are starting to dismantle that logic.
Code, tests, scaffolding, and documentation can now be generated in parallel and at speed. The middle of the delivery lifecycle—the part where teams have historically spent the majority of their time and energy—is compressing. What used to take weeks of coordinated effort can now take days. And that number will keep shrinking.
This isn't to say we're living in a world where everything is instant. We're not. But we are moving into one where far more is possible, far more quickly, and that shift changes the nature of the constraint in a fundamental way. Capacity is no longer what's holding teams back. The bottleneck has moved.
When execution is the hard part, the question the business has to answer is relatively tractable: what can we build within these constraints? That's a resource question. A planning question. One that program managers and delivery leads have tooling, frameworks, and years of practice to navigate.
The question that replaces it is much harder: what should we build?
That's not a delivery question. It's a strategic one. And most organizations haven't had to answer it with this kind of urgency before. When the cost of building something was high, bad prioritization was punished slowly. You'd spend three months on the wrong thing and feel the consequences gradually. Now, you can spend three weeks on the wrong thing and feel them immediately, at scale, with compounding rework and misaligned momentum already in motion.
The constraint hasn't disappeared. It's just moved somewhere most organizations are much less equipped to handle it.
Approval chains, steering committees, scheduled release windows, risk reviews— these structures assumed that change is expensive and should be tightly controlled. That assumption made sense when it took months to build something, but it doesn't hold anymore.
What I see in practice is organizations have accelerated execution without redesigning how decisions get made: the work gets done faster, but the approval process hasn't changed. Throughput doesn't improve because friction just moves to a different place.
The fix isn't to eliminate governance, it's to redesign it for the speed the business actually needs to operate at. That means separating high-risk changes from low-risk ones so they don't bottleneck each other. It means pushing decision rights closer to delivery teams, with clear guardrails rather than layers of approval. It means treating governance as something that enables speed rather than something that manages it from a distance.
This is uncomfortable for a lot of organizations. It requires trusting teams with decisions that used to sit higher up. But the alternative—maintaining slow governance in a fast-execution environment—is not a neutral choice. It has a real cost.
Think of it like deploying a high-speed robot vacuum in a room cluttered with obstacles. The technology can move faster and cover more ground, but if the environment is constrained—furniture everywhere, pathways unclear, constant interruptions—it doesn’t matter how advanced the system is, it will still struggle to move efficiently.
The same is true for delivery.
Testing is where I see the most underestimation.
The common assumption is that AI-assisted delivery means less QA burden, providing more automation, faster coverage, fewer manual cycles. And in some ways, that's true. But agentic systems also introduce something traditional QA wasn't built for: non-deterministic behavior. Outputs that aren't always predictable, even when the system is working exactly as designed.
A recommendation engine that performs well in most scenarios but occasionally produces irrelevant or misleading results isn't broken in the traditional sense. But it may still be failing in ways that matter to the customer. Traditional QA would pass it; a well-designed validation practice might not.
The question validation has to answer is no longer "does it work?" It has to become: does it consistently produce the right outcomes in real-world conditions? That's a different scope, a different cadence, and in most organizations, a significant uplift in capability.
Teams that treat validation as a final checkpoint are going to struggle. It has to become continuous, scenario-based, and oriented around outcomes, not just functionality.
The last piece is the one that tends to sting most in leadership conversations: as delivery accelerates, the ability to connect that delivery to business outcomes often doesn't improve. In some cases it gets worse.
More features shipped. More experiments running. More releases going out. But when someone asks whether any of it moved the needle, the honest answer is often: we're not sure.
This isn't a new problem. Organizations have been investing in analytics and reporting for years without solving it. But agentic delivery raises the stakes considerably. When you can ship faster, the cost of shipping the wrong things goes up proportionally.
The challenge is that this isn’t just a data or tooling problem. It’s a cultural shift in how decisions get made, and those are slower, harder changes. That’s why most organizations still haven’t closed this gap. It means decisions are expected to be evidence-based before they're made, not after they're shipped.
What I've seen work is treating measurement as a precondition, not a follow-up. Before a team prioritizes a feature, what's the measurable outcome it's supposed to drive? What's the baseline? How will we know if it worked? These aren't complicated questions, but they require discipline to ask consistently, especially when delivery can move fast enough to make retrospective justification feel easier than upfront clarity.
And it starts with intentional steps to reinforce the right habits. Consider small shifts like requiring “expected outcome, metric, baseline” before work is prioritized, adding a quick measurement check into backlog refinement, or redefining the “definition of ready” to mean the team can clearly state how the work will create measurable impact.
What I'm describing amounts to a fundamental change in where the work actually lives in the delivery lifecycle.
The middle—production, execution, the stuff delivery teams have historically spent most of their time on—is compressing. That part is getting faster and will keep getting faster.
What's expanding is the front and back. More rigorous work up front: alignment, decision ownership, architecture, outcome definition. More rigorous work at the back: validation, QA, outcome measurement. And throughout, governance that can actually move at the speed of the business rather than governing against it.
For delivery leaders, this is a meaningful shift in where to invest attention. The teams that optimize for execution speed while leaving alignment, governance, and validation in their current state are not going to see the returns they're expecting.
The organizations that will get the most out of this moment are the ones willing to do the harder work: getting genuinely clear on decisions before they build, redesigning how change gets approved, raising the bar on validation, and building the evidence-based culture that lets them actually know whether what they shipped was worth shipping.
You can upgrade the tech all you want. But if the operating model doesn’t change, you’ll simply move faster without moving forward.
This is part four in a five-part series on the confrontations agentic AI forces organizations to face. Read the rest here Part 1, Part 2, Part 3, and Part 5.
Stop using messages as your agent's memory. Learn how structured state makes AI agents more reliable, efficient, and production-ready.
Traditional approaches to change management weren’t working before. AI just makes the gaps impossible to ignore.
How smart companies are evolving with agent-powered delivery models, and what it takes to lead in the new era of intelligent services.