All articles
engineering

95% of companies get no ROI from AI. Here is what separates the 5%

The MIT study released in mid-2025 put a number on something most practitioners already suspected: 95% of generative AI pilots are failing to deliver measurable return on investment. McKinsey’s data lands in roughly the same place, with only around 6% of companies capturing EBIT improvements exceeding 5% from AI investments. Fifty-six percent of CEOs report no revenue growth from AI at all.

These numbers are not an indictment of the technology. They are an indictment of how organisations are deploying it.

The gap between the 5% and the 95% is not the model they chose, the vendor they signed, or the size of their AI budget. It is a set of upstream decisions that most engineering leaders are not treating as engineering decisions at all.


The statistic is more useful than it looks

When a number like 95% failure surfaces, the instinct is to either dismiss it (“our situation is different”) or panic (“we need a new strategy immediately”). Both responses miss the point.

The useful question is: what are the 5% doing that the 95% are not?

The McKinsey analysis is consistent on this. High-performing AI organisations share a few structural characteristics that have nothing to do with which foundation model they are running. They map work before they automate it. They treat use case selection as a primary engineering decision. They have clear ownership models between AI output and product value. And critically, they are not trying to make existing processes faster. They are rethinking whether those processes should exist in their current form at all.

That last point is where most organisations are losing.


Adding AI to a broken process makes it a faster broken process

There is a principle from traditional process engineering that is relevant here: you cannot automate your way out of a poorly designed process. Value stream mapping, lean manufacturing, and decades of operations research all point to the same thing. Map the process first. Identify where value is created and where waste accumulates. Standardise and make it repeatable. Then, and only then, apply tooling to augment or eliminate specific steps.

Most AI deployments skip the first three steps entirely.

The result is predictable. A service desk team gets an AI triage tool layered on top of a ticket routing process that was already generating unnecessary escalations. The AI makes the team faster at handling tickets that should not have existed in the first place. The 56% of CEOs seeing no revenue growth are, in many cases, presiding over exactly this pattern: AI-augmented inefficiency.

The organisations in the top 5% are not asking “how do we make this process more efficient with AI?” They are asking “given what AI can now do, what is the right process design from scratch?”

That is a different question. It requires a different starting point.


Use case selection is the first engineering decision

Most organisations treat use case selection as a product or strategy question. It gets decided in a leadership offsite, handed to an engineering team, and framed as a delivery problem. The engineering team’s job becomes building what was specified, not questioning whether the specification is right.

This is where the ROI gap starts.

Use case selection is fundamentally an engineering decision because the people who understand where AI will and will not work reliably are the people building with it. A use case that looks compelling in a slide deck can fail for reasons that are immediately obvious to a practitioner: the data is too unstructured, the edge cases are too numerous, the human judgment required for exceptions is too high, or the feedback loop for improvement is too slow.

The organisations getting ROI are doing something specific here. They are identifying high-leverage areas by mapping their processes first, using frameworks like value stream mapping to find where work stalls, where handoffs create delays, and where decisions are being made on incomplete information. Then they are asking which of those bottlenecks are genuinely addressable with AI, and which are organisational or data problems that AI cannot fix.

Two concrete examples illustrate the difference.

A service desk function where agents are spending 60% of their time on password resets and access requests is a genuine AI use case. The tasks are high-volume, low-variance, and the cost of an error is recoverable. Automate the routine, redirect the humans to complex cases, and the structural economics change. That is what the ServiceNow AI Pacesetter research documents in practice: deflection rates moving from 18% to 94%, mean time to resolution dropping by a day within two months. Those numbers come from use cases that were selected because the underlying process was already understood and the AI had a clear, bounded job to do.

A sales ops function where the problem is that the CRM data is unreliable and the forecasting process depends on relationship knowledge that lives in people’s heads is not an AI use case. It is a data governance and process design problem. Adding an AI forecasting layer on top of that does not fix the underlying issue. It makes the unreliability harder to see.

The discipline is being honest about which category a given use case falls into before committing engineering resources to it.


The mandate problem

The r/ExperiencedDevs thread on AI adoption surfaces something that most formal research does not capture well: the pattern of top-down AI mandates without bottom-up use case validation is extremely common, and it is one of the primary drivers of the 95% failure rate.

The dynamic looks like this. A board or executive team decides the organisation needs an AI strategy. That mandate gets translated into a directive for engineering teams to ship AI features or integrate AI tooling by a given date. The teams comply. They ship something. The something does not move any metric that matters. The organisation concludes that AI is overhyped, or that they need a different vendor, or that their data is the problem.

None of those conclusions are correct. The actual problem is that the use case was never validated from the ground up.

The organisations getting ROI tend to have a different mandate structure. The directive from leadership is not “ship AI features.” It is “identify where AI can change the economics of how we work, and bring us a validated proposal.” That shifts the first question from “what can we build?” to “what problem are we actually solving, and is AI the right tool for it?”

This sounds obvious. It is not common.


Team structure differences

The Forbes analysis of the 25% seeing ROI identifies a consistent structural pattern in high-performing AI organisations. They have closed the gap between the people who understand the business problem, the people who understand the data, and the people who understand what AI can reliably do. In lower-performing organisations, those three groups are organisationally separated and communicate through handoffs.

The handoff model creates a specific failure mode. The business stakeholder describes the problem in business terms. It gets translated into a data or model requirement. The engineering team builds to that requirement. The output does not match what the business stakeholder actually needed, because the translation lost something. This is not a communication problem. It is a structural problem.

The fix is not better documentation or more meetings. It is putting people with different skills in direct contact with the problem from the start, and giving them shared ownership of the outcome, not just their piece of the delivery chain.

This is what the “founder mindset” framing in AI engineering gets at. A founder building an AI product does not hand off the use case definition to a product manager and wait for a spec. They stay in contact with the problem, the data, and the output throughout. The organisations replicating that pattern inside larger structures are the ones closing the ROI gap.


The buy-versus-build question

One decision that consistently separates high and low performers is how they approach the buy-versus-build question, and most organisations are getting it wrong in the same direction.

They are building when they should be buying.

The instinct to build custom AI solutions is understandable. Engineering teams want to work on interesting problems. Leaders want differentiated capability. And there is a real concern that buying off-the-shelf tooling means giving up competitive advantage.

In most enterprise contexts, that concern is misplaced. The competitive advantage in AI does not come from owning the model or the infrastructure. It comes from the quality of the use case selection, the quality of the data, and the speed at which the organisation can learn and iterate. A well-configured, well-integrated commercial platform will outperform a custom-built solution maintained by a team that does not have deep AI engineering expertise, in almost every enterprise context.

The buy-versus-build calculation also changes when you factor in total cost of ownership over a three to five year horizon. Custom solutions require ongoing maintenance, model updates, security patching, and the organisational knowledge to keep them running. Commercial platforms absorb those costs and improve continuously as the underlying models improve.

Partner selection is genuinely difficult right now. The market is crowded, vendor claims are hard to verify, and the pace of change means that a platform that was best-in-class 18 months ago may not be today. That difficulty is real and should not be dismissed. But it is an argument for investing in better partner evaluation, not for defaulting to build.

The organisations in the top 5% are predominantly buying core infrastructure and building at the edges, where they have genuine domain expertise and differentiated data. That is the right division of labour.


What to stop doing before adding more AI tooling

The Berkeley analysis of the MIT study makes a point that deserves more attention: ROI may be the wrong primary metric for early-stage AI investment, but that argument can also become a convenient excuse for not measuring anything at all.

The organisations failing to get ROI are not, in most cases, failing because they are measuring the wrong thing. They are failing because they are not doing the upstream work that would make any measurement meaningful.

Before adding more AI tooling, most engineering organisations would benefit from stopping a few things:

  • Stop layering AI on processes that have not been mapped. If you cannot draw the current process clearly enough to identify where decisions are made and where handoffs occur, you do not have enough information to know where AI will help.
  • Stop treating AI as a productivity multiplier for individual contributors without examining whether the work itself should change. Making a person 30% faster at a task that generates no downstream value is not a win.
  • Stop letting use case selection happen above the engineering team. The people closest to the data and the model behaviour need to be part of the decision about what to build, not just how to build it.
  • Stop measuring AI success by deployment. Shipping a feature is not evidence of value. The measurement question is what changed in the business after the feature shipped.

The 5% are not smarter or better resourced than the 95%. They are more disciplined about the upstream decisions that determine whether the downstream investment has any chance of working.


What to do in the next quarter

If you are an engineering leader trying to close this gap, the practical starting point is not a new AI tool or a new team structure. It is a process audit.

Pick one function where AI investment is either planned or already underway. Map the current process in enough detail to identify where value is created, where it is destroyed, and where decisions are being made on incomplete information. Use that map to ask honestly: is the AI being applied to a genuine bottleneck, or to a symptom of a deeper process problem?

If it is a genuine bottleneck, the next question is whether the use case is bounded enough for AI to be reliable. High-volume, low-variance tasks with recoverable errors and clear feedback loops are where AI earns its ROI. Low-volume, high-variance tasks requiring judgment calls that depend on context AI cannot access are where it does not.

If the use case passes that test, the final question is whether you are buying or building, and whether that decision was made on the merits or by default.

The 95% failure rate is not a technology problem. It is a decision-making problem. The technology is capable enough. The upstream decisions are where the work is.