Why Your AI Vendor Doesn't Want You to Measure ROI

We’ve watched this play out dozens of times. A client signs a six-figure engagement with an AI vendor, six months later they’re staring at dashboards full of accuracy metrics and model performance numbers, and when they ask the obvious question, “So what did this do for our business?”, they get deflection. The vendor pivots to technical performance. They cite precision improvements of 12%. They show latency reductions of 40%. They do everything except answer the question, because answering it would expose an uncomfortable truth that keeps the consulting industry well-funded: most AI projects generate negative ROI, and the vendors know it.

The misaligned incentives in AI consulting aren’t a bug in the system; they’re the entire system. Understanding this dynamic is essential before you sign another statement of work.

The Metrics Shell Game

The fundamental problem is that AI vendors measure what they can control, not what you need. A model can be technically excellent and commercially useless simultaneously, and the industry has perfected the art of conflating the two.

Consider a client that came to us last year, a mid-market logistics company that spent €2.1 million on a demand forecasting system. The vendor delivered a model with 94% accuracy, which sounds remarkable until you understand that the baseline statistical forecast they replaced was already at 89%. The 5% improvement, when translated into actual inventory decisions, generated approximately €340,000 in annual savings. The project had a payback period of over six years, assuming no maintenance costs. But on paper, the vendor could point to a 94% accuracy number and call it a success.

This is the metrics shell game. Accuracy, precision, recall, F1 scores, these are legitimate technical metrics, but they have a loose and often misleading relationship with business outcomes. A fraud detection model with 99.9% precision sounds exceptional until you realize it rejects legitimate transactions at a rate that destroys customer relationships. A customer churn model that identifies 85% of at-risk accounts is useless if the retention interventions you can actually execute only reach 20% of them. The vendor wins either way because they delivered what they promised on the metric they selected.

The more sophisticated vendors have evolved beyond simple accuracy claims. They now speak in terms of “efficiency gains” and “productivity improvements” that are nearly impossible to verify. One engagement we analyzed showed a vendor claiming “60% reduction in processing time,” which was technically true, the model processed records faster, but the bottleneck in the actual business process was human review downstream, which remained unchanged. The processing time reduction created zero business value while generating a compelling slide for the quarterly review.

How Vendors Structure Engagements to Avoid Accountability

The structure of AI consulting engagements is deliberately designed to separate vendor responsibility from client outcomes. Understanding these structural patterns is the first defense against expensive disappointments.

Scope definitions in AI projects typically focus on model delivery rather than business integration. The statement of work will specify that the vendor will “deliver a trained model meeting specified performance criteria” or “implement a pipeline with defined latency characteristics.” These are engineering specifications, not business commitments. When the model fails to move the needle on actual business metrics, the vendor fulfilled their contract. Your internal team is left holding the bag for an implementation that never had a chance.

We see this pattern repeatedly in projects structured around “pilot” or “proof of concept” phases. The pilot success criteria are almost always defined in technical terms, achieving a certain accuracy threshold, processing a certain volume of records, meeting specific latency requirements. There’s rarely a pilot success criterion that reads “generating measurable revenue improvement” or “reducing operational costs by a defined amount.” The pilot is designed to succeed technically while remaining disconnected from commercial reality.

The payment structure compounds this problem. Many AI engagements are front-loaded with large implementation fees and smaller ongoing maintenance costs. This creates a dynamic where the vendor’s revenue is largely secured before any business value is demonstrated. We’ve seen contracts where 80% of the total engagement value was due upon “model delivery,” with the remaining 20% tied to trivial acceptance criteria. The vendor has every incentive to declare the project complete as quickly as possible and move to the next engagement.

Perhaps most insidiously, vendors structure change orders and scope modifications in ways that trap clients in escalating commitments. Once a client has invested heavily in a vendor’s ecosystem, their data models, their integration architecture, their specific implementation choices, switching costs become astronomical. The vendor knows this and prices subsequent phases accordingly. A client we worked with saw their initial €400,000 project expand to €2.3 million over two years, with the original vendor, before realizing they were building something that would never generate positive returns.

What Real AI ROI Measurement Actually Looks Like

Genuine ROI measurement for AI projects requires thinking in terms that most vendors and many clients find uncomfortable. It starts with establishing baseline metrics before any work begins and committing to measuring the same metrics after deployment.

The baseline establishment is where most projects fail before they start. Clients rarely have clear visibility into their current state. A claims processing operation won’t know their average handling time until someone spends weeks extracting and cleaning operational data. A customer service function won’t have accurate deflection rates for their self-service channels. The absence of baseline data isn’t just an analytical gap; it’s a negotiating vulnerability that vendors exploit by proposing success criteria in a vacuum.

Real ROI measurement also requires attribution modeling that accounts for confounding factors. The business doesn’t exist in a laboratory. Marketing campaigns launch simultaneously with AI deployments. Seasonal variations create noise. Regulatory changes alter customer behavior. Without a rigorous approach to attribution, you’ll never know whether the AI caused the improvement or simply correlated with it. We’ve used difference-in-differences analysis, controlled pilots, and synthetic control methods depending on the client’s data infrastructure, but the key principle is universal: you need a credible counterfactual story.

The time horizon matters enormously. Many AI investments show negative returns in year one and positive returns only in years two through four as the model learns, as processes adapt, and as complementary investments pay off. This is normal for transformative technology, but it creates a problem when vendors are evaluated on quarterly cycles and clients are under pressure to justify budgets annually. We helped one client structure a four-year ROI analysis that showed strong returns, which allowed them to sustain an investment that would have been killed under standard annual review processes.

The Questions to Ask Before Signing

The vendors who structure engagements to avoid accountability are counting on clients not asking the hard questions. Here are the ones that matter.

Ask about payment structure tied to outcomes, not delivery. Push back on front-loaded fee schedules. Propose success payments that trigger when business metrics move, not when technical artifacts are handed over. The vendor’s reaction to this question tells you everything; if they resist violently, you understand where their incentives lie.

Ask for the vendor’s track record on projects similar to yours, with specific business outcomes disclosed. Not references; references are marketing exercises. Actual outcomes, with names redacted if necessary, but specific enough to evaluate. A vendor who claims to have done this work before should be able to describe what worked, what failed, and what they learned. Vague assurances should concern you.

Ask who on the vendor team will be dedicated to integration and change management, not just model development. The technical model is often the easiest part. The hard work is getting it adopted, monitored, and improved over time. Vendors who treat “deployment” as the end of their responsibility are implicitly telling you that they view the project as complete when the hard work begins.

Ask to see the data flows and integration points in your specific environment, not in a hypothetical reference architecture. AI projects fail at integration points more than anywhere else. The vendor who can’t articulate exactly how data will flow from your operational systems into their model, and from the model back into business processes, is either inexperienced or deliberately obscuring the complexity.

Case Patterns of Vendor Relationships Gone Wrong

We’ve catalogued enough failures to identify recurring patterns that signal trouble.

The “black box” pattern appears when vendors resist explaining how their models work in terms that your team can understand. Legitimate vendors will always be able to articulate model behavior in business terms, why certain predictions are made, which features drive outcomes, what scenarios create edge cases. Vendors who hide behind “proprietary methodology” are often hiding ineffective methodology.

The “scope creep by denial” pattern appears when vendors agree to initial scopes that are obviously incomplete, knowing that the missing elements will emerge later at premium prices. A vendor who doesn’t push back on an unrealistically narrow scope is either inexperienced or planning to monetize the expansion. The healthy dynamic is a vendor who tells you what you’re missing.

The “we’re the only ones who can support this” pattern appears after implementation, when vendors leverage lock-in to extract excessive maintenance fees. AI systems do require ongoing support, but legitimate vendors will be transparent about what that support costs and what happens if you choose to move to alternative providers. Vendors who imply that the system will stop working without their continued involvement are signalling that they’re prioritizing lock-in over partnership.

The solution isn’t to avoid vendors entirely; many have genuine expertise that would take years to develop internally. The solution is to structure engagements that align incentives properly, measure what actually matters, and preserve optionality for the future. The vendors who resist these structures are telling you exactly what you need to know about working with them.

The Metrics Shell Game

How Vendors Structure Engagements to Avoid Accountability

What Real AI ROI Measurement Actually Looks Like

The Questions to Ask Before Signing

Case Patterns of Vendor Relationships Gone Wrong

Considering AI for your business?