2026-03-30

Why AI Agent Pilots Fail in Retail— and How to Fix It

A team sits at a desk discussing analytics displayed on a laptop screen.

By Jason Cottrell, CEO, Orium and President, MACH Alliance

7 min read

Over the past year, I've had some version of the same conversation with nearly every retailer I've met. They're experimenting with AI agents—in customer service, operations, marketing, somewhere—and when I ask how it's going, the answer is almost always the same: lots of activity, not much proven value. There’s a growing gap between experimentation and value, and where I see most businesses tripping up is believing it’s a problem with their AI model.

The hard part isn't the model

Ethan Mollick has a line I come back to often: even if AI stopped improving today, it would still take us five or more years to fully absorb and apply what we already have. The challenge in front of us isn't waiting for better AI, it's learning how to actually use what's here.

Which means the problem most retailers are running into isn't the model. It's everything around it.

When most teams struggle with agent pilots, it's because they approached deployment like rolling out a new software tool— a better chatbot, a smarter search. But agents aren't tools. They're business systems. That distinction changes everything about how you plan, build, and govern them.

In practice, the barriers to successful deployment fall into four categories: general-purpose agents trying to do everything at once; data that agents can't access in real time because it's siloed between departments or locked in documents; systems that agents can't act through because they have little or no API exposure; and unclear ownership and governance, with no one accountable for whether the agent is actually performing well.

The barrier isn't the AI. It's the integration, architecture, data, and especially the organizational structures around it.

Specialist agents beat generalist agents

One of the biggest things that trips teams up: they try to build one agent that does everything. A catch-all assistant sitting on top of their stack, fielding whatever comes its way.

Often-cited McKinsey estimates state that more than 70% of AI's total value potential will come from vertical, domain-specific applications — not generic models. In my experience, that's exactly right. What most people imagine is one big agent, but what's actually coming is many specialized agents working together, the same way your teams already do.

The agents that deliver real business value are narrowly scoped and embedded in the systems where work actually happens. An inventory agent in your ERP that monitors sell-through in real time, flags replenishment risks before stockouts occur, and recommends reorder quantities based on demand signals— that's not just answering a question, that's taking action inside your business. A campaign agent in your CDP that identifies high-value segments, generates and tests variations, and optimizes timing based on performance data. A fraud agent in your payment processor that detects anomalous patterns in real time and continuously improves its own detection using historical signals.

These work because they're specific. They have access to the right data, they're connected to the right systems, and someone owns the outcome.

And once you move beyond a single agent, something important shifts. The question stops being "what can this agent do?" and becomes "who owns the outcomes when many agents interact?" That's when infrastructure starts to matter— a lot.

Your customers will have agents too

Agents won't just run parts of your business. They'll also represent your customers interacting with those systems.

Customer AI assistants will increasingly interact with brands on a customer’s behalf— personal shopping assistants researching and comparing products, agents managing subscriptions and routine reorders, customer agents negotiating pricing and availability, and eventually autonomous agents completing transactions across multiple retailers without any humans in the loop at all.

We're entering a paradigm where human-plus-agent interactions become the norm, both behind the scenes and in customer-facing experiences. Even if all you're doing today is isolated, one-off experiments to test value and prove ROI, you should expect that in short order many agents will be interacting on both sides of the transaction.

That's a reason to build the right foundation now, not later.

You're hiring agents, not implementing them

The most useful mental model I've found for deploying agents is to treat them like new employees.

If you hired someone tomorrow, you'd give them three things: visibility into the business, the tools to do the job, and someone accountable for their outcomes. You wouldn't hire a person and expect them to succeed without access to your systems, without software and permissions, without a manager or a clear set of KPIs.

Agents need exactly the same things.

Visibility — can the agent access the data it needs, in real time, in a structured way? If your data is batched, siloed, or locked in PDFs, the agent is operating blind.
Tools — do your systems expose APIs or workflows that software can actually trigger? If not, the agent can reason all it wants but can't actually do anything.
Accountability — does someone own the outcome? Is there governance in place, guardrails, a way to measure whether the agent is performing well or drifting off course?

If the answer to any of those is no, the problem isn't the AI. The problem is the system you're asking it to operate inside. If you wouldn't hire an employee without giving them these things, don't deploy an agent without them either.

Making your stack agent-ready

When teams ask where to even start, I break it down into five buckets, the primary slices of your business that define how an agent can operate.

Core systems of record (ERP, finance, HR/payroll) need to become AI-first by design, so agents can operate directly inside them with clear integration standards and deterministic guardrails.
Commerce and operations (POS, ecom, OMS, supply chain) need to open up through APIs and event-driven architecture, with explicit guardrails so agents can act safely.
Customer and marketing systems (CRM, loyalty, CDP) become the real-time interaction layer, where automation shows up first in customer-facing ways, driven by streaming data and agent orchestration.
Content and knowledge (policies, training, product catalog, planograms) is mostly locked in PDFs and intranets today. For agents to use it, it needs to become structured, machine-readable, and modular.
Data and analytics (warehouse, marts, BI/reporting) is often batch-driven today. For agents, it needs to shift toward real-time pipelines with semantic layers that enable live decision-making.

These five buckets are your roadmap for making your architecture agent-ready— and they're exactly where composable, API-first systems have a structural advantage over monolithic ones.

How to run a first pilot that actually teaches you something

The goal of a first agent pilot shouldn't be perfection; it should be learning. Specifically, it should be learning about your data readiness, your API readiness, and your governance readiness.

That means starting small and being deliberate about it. Three things matter most. First, choose a narrow use case, something high frequency, low-to-medium risk, with clear inputs and a definition of done you can measure within 30 to 60 days. Avoid starting customer-facing, or picking something that requires perfect data across a dozen systems.

Second, build a human-plus-agent workflow first. Have agents assist before they act independently. Start with "recommend → approve → act" before the agent operates on its own.

And finally, measure one primary metric and one risk metric: for example, deflection rate plus escalation accuracy, or time saved plus error rate. Two numbers. That's enough for a first pilot.

Every pilot, even a modest one, teaches you something you'll need when you start scaling to many agents working together. Choose one that gives you useful signal quickly.

The foundation is the advantage

Agents are becoming your digital workforce, so treat them like employees. Give them visibility, tools, and accountability. The biggest value comes from specialist agents embedded in the systems where real work happens. And the companies that win in this era won't necessarily have the most sophisticated models. They'll be the ones that learned the fastest and built the right foundations early.

The biggest competitive advantage will come from building the foundation for a team of agents.

Building Agents? Stop Treating messages[] Like a Database

Stop using messages as your agent's memory. Learn how structured state makes AI agents more reliable, efficient, and production-ready.

Why AI Forces a Rethink of Change Management

Traditional approaches to change management weren’t working before. AI just makes the gaps impossible to ignore.

AI Isn’t Killing Services, It’s Redefining Them

How smart companies are evolving with agent-powered delivery models, and what it takes to lead in the new era of intelligent services.