Skip to main content

#6 Beyond Pilot Purgatory

·1719 words·9 mins

There’s a peculiar ritual playing out in boardrooms across the globe. It involves a slick PowerPoint, a live demo of a generative AI tool flawlessly summarising a dense report, and a round of impressed, if slightly nervous, applause. The pilot is a success. The team is congratulated. And then… nothing. The brilliant prototype, the darling of the innovation lab, never sees the light of a production environment. It remains a clever party trick, perpetually stuck in pilot purgatory.

This isn’t a rare occurrence; it’s fast becoming the default. We are living through the great paradox of enterprise AI: a moment of peak investment and peak hype, coinciding with an astonishing, accelerating rate of project failure. The rush to do something with AI, driven by a palpable fear of being left behind, is ironically the very thing causing so many initiatives to stumble and fall. It seems in our haste to build the future, we’ve forgotten how to build things that last.

The Briefing
#

The past weeks have seen significant, and often conflicting, developments in artificial intelligence, spanning top-level government strategy, real-world operational stumbles, and major product releases from technology giants.

On July 23, the White House unveiled its AI Action Plan, a 90-point framework designed to accelerate the country’s lead in AI. The strategy is built on three pillars: accelerating innovation, building out infrastructure, and leading in international diplomacy. Key actions include a sweeping review to repeal federal regulations that hinder AI development, streamlining environmental permits for the construction of data centres, and using federal funding as leverage to discourage states from passing their own “burdensome” AI laws.

The plan was accompanied by three executive orders. One, titled “Preventing Woke AI in the Federal Government” mandates that federal agencies may only procure Large Language Models (LLMs) that are “truth-seeking” and “ideologically neutral”. It explicitly defines Diversity, Equity, and Inclusion (DEI) as a “destructive” ideology and directs the National Institute of Standards and Technology (NIST) to remove references to DEI, misinformation, and climate change from its AI Risk Management Framework. I have read enough history to have strange deja vu, reminding of Russia or North Korea.

In contrast to this top-down strategic push, a report from the Polish business daily Puls Biznesu highlighted the operational risks of uncontrolled use of immature AI systems. According to reports, a Polish government agency expert probably used an AI system to automate the screening of subsidy applications. The system began to “hallucinate”—confidently generating false information—and invented fictitious reasons to deny legitimate applications, blocking funds and causing significant administrative disruption. The incident serves as a practical example of the technology’s current limitations, echoing the severe consequences of the Dutch childcare benefits scandal, where a flawed algorithm wrongly accused thousands of families of fraud.

The technology sector, meanwhile, demonstrated accelerating enterprise adoption. Data analytics firm Palantir Technologies has seen its stock price more than double in 2025, with its market capitalisation briefly reaching $375 billion on July 25. This growth is largely attributed to the rapid adoption of its Artificial Intelligence Platform (AIP). In its first-quarter results for 2025, Palantir reported that its U.S. commercial revenue grew 71% year-over-year, surpassing a $1 billion annual run rate, while its customer count in the segment grew by 69%. The company is scheduled to release its second-quarter earnings on August 4.

Other companies are also moving from experimentation to production. During its quarterly earnings call, Netflix revealed it used generative AI for the first time to create on-screen visual effects for its Argentine sci-fi series El Eternauta. A scene featuring a building collapse was reportedly completed ten times faster and at a fraction of the cost of traditional methods. On July 23, Ally Financial announced the enterprise-wide rollout of its proprietary AI platform, Ally.ai, giving its 10,000 employees access to generative AI tools to streamline daily tasks.

Deeper Dive: Beyond Pilot Purgatory
#

The Anatomy of a Silent Crash
#

When an aeroplane crashes, the investigation rarely uncovers a single, catastrophic cause. Instead, it reveals a chain of small, interconnected failures—a faulty sensor, a misunderstood warning, a deviation from procedure—that cascade into disaster. The failure of an enterprise AI project is no different. The post-mortem that blames “poor data quality” is as simplistic as blaming a plane crash on “gravity.” It mistakes the final, obvious symptom for the complex underlying disease. The failure chain begins with a flawed origin story. An initiative born from a vague mandate like “Let’s use GenAI to improve customer service” is doomed from the start. This isn’t a strategy; it’s a solution looking for a problem. This tech-first approach leads to what I call ‘model fetishism’—teams obsessing over accuracy scores in a sterile lab, completely detached from the messy reality of the business process they’re meant to improve. Compounding this is the inherent weakness of the technology itself. Today’s Large Language Models, for all their fluency, possess a Potemkin understanding of the world. They are masters of statistical mimicry, not genuine comprehension. They have no underlying world model, no real grasp of cause and effect. This makes them brilliant assistants for certain tasks, but terrifyingly unreliable architects of critical processes. Believing a demo in a controlled sandbox is proof of enterprise-readiness is a profound category error. The real challenge isn’t making the model work once; it’s ensuring it doesn’t fail in a thousand unpredictable ways when exposed to the chaos of millions of real-world requests.

The Ghost in the Machine is Change Management
#

If you want to see the future of AI chaos, look to the history of IT. Remember the 2000s? Every department had its own budget to buy its own technology, resulting in a fragmented, siloed, and breathtakingly wasteful landscape of incompatible systems. We are repeating the exact same mistake with AI. Disconnected teams are spinning up duplicate vector databases and orphaned GPU clusters in a frenzy of uncoordinated activity, creating a governance nightmare that makes enterprise-wide scaling impossible. The root of this is a failure to recognise that implementing AI is not a technology project; it is a change management project. You are not simply installing a new tool. You are redesigning an end-to-end business workflow, and that workflow is operated by humans who have habits, incentives, and a healthy scepticism of new things. A summarisation tool with 95% accuracy is worthless if supervisors, fearing the risk of a single error, instruct their teams to keep writing manual notes anyway. This isn’t a technology failure; it’s a failure of trust and adoption. This brings us to the perennial scapegoat: data. The problem isn’t a lack of data. It’s a lack of AI-ready data. Leaders treat “data cleansing” as a one-off project, a spring clean before the guests arrive. But AI-ready data isn’t a static state of perfection; it’s a dynamic capability. The messy, outlier-filled data that traditional BI systems are designed to scrub is often the very data that contains the most valuable signals for an AI model. Building the capability to manage, govern, and qualify data for specific use cases is the unglamorous, non-negotiable foundation for success. AI cannot fix your data problems; it just finds them faster.

The ROI Conundrum: Measuring Fog with a Ruler
#

The final hurdle where most pilots fall is the demand to prove a traditional, linear Return on Investment (ROI). This is like trying to measure the value of a university education by calculating the cost of the textbooks. It’s a flawed yardstick for a complex, emergent technology. The value of AI rarely arrives in the tidy, predictable way that a finance department’s spreadsheet demands. The benefits are often indirect (improved decision quality), delayed (accelerated innovation cycles), and qualitative (better employee experience). A recent study of Novo Nordisk’s GenAI rollout found that employee satisfaction was three times more strongly correlated with perceived improvements in work quality than with raw time saved. How do you plug that into an NPV calculation? Forcing a nascent technology into a rigid ROI model creates a vicious cycle. Teams either contort their project’s value into an unconvincing financial case, or they admit the ROI is unclear and watch their strategically vital project get defunded. So, what’s the alternative? We must broaden our definition of value and change the models. For complex systems, this may mean embracing more radical measurement techniques. Consider the concept of ‘digital twins’—using AI to create a simulation of a process or customer. You can run countless experiments in this simulated world to precisely isolate the AI’s causal effect, effectively turning ‘soft’ metrics like customer engagement into forecastable, attributable financial inputs. We must start using AI to measure AI.

Questions for Leaders
#

As you navigate your own AI journey, here are a few questions to consider:

1 Am I funding a science experiment or a business solution? Look at your portfolio of AI pilots. Can the project lead articulate, in a single sentence, the specific, quantified business pain they are solving? If not, why is it being funded?

2 Is my organisation having an immune reaction to this project? Is the AI initiative being treated as an isolated IT project, or is it an integral part of a broader business transformation, with genuine C-suite ownership and cross-functional teams who share the same definition of success?

3 If this pilot is 100% successful, what happens next? Is there a clear, costed, and agreed-upon path to production? Have we built the bridge (the MLOps, the infrastructure, the change management plan) before we’ve reached the chasm?

4 Are we measuring the right things? Are we forcing teams to justify strategic, long-term value with short-term, linear ROI models? How can we create a culture that formally recognises the value of improved decision-making, innovation capacity, and employee experience?

The challenge of moving beyond pilot purgatory is not a crisis of technology, but a crisis of leadership and strategy. Success will not be found in the latest model or the cleverest algorithm. It will be found in discipline, pragmatism, and a relentless focus on solving real problems. It requires treating AI not as a magic box to be installed, but as a core competency to be painstakingly developed across the entire enterprise. In our next issue, we’ll explore the emerging landscape of ‘AI-as-a-Utility’ and what it means for long-term strategy and vendor risk management.

Until then, stay balanced.