Six months ago, your Board approved an AI strategy engagement. The slides looked good. The roadmap was detailed. The governance framework had its own charter. The fee was substantial.
You still have working pilots in demo environments and nothing in production. The business case from slide 47 has not appeared. The consultants have moved on; you are left with a 100-page document and no running systems.
This is not a one-off. The same pattern plays out across large enterprises. The usual reasons — “the technology wasn’t ready”, “the data was messy”, “change management failed” — describe symptoms, not causes.
The structural problem starts with who you hired and how the work was scoped.
The Integration Gap#
AI projects cut across five layers: Strategy → Governance → Process → Architecture → Technology.
Traditional consulting works mainly in the first two. Engineering teams live in the last two. The layer in the middle is under‑owned.
Your AI strategy says “automate customer queries”. But who decides:
Which queries the AI handles and which it escalates? (Process)
What happens when confidence drops below a threshold? (Governance-as-Code)
Whether the architecture can support low-latency decisions at your real traffic volumes? (Architecture)
How the human override fits into existing workflows? (Operating model)
The strategy document rarely answers these questions, because the authors do not work close enough to delivery to ask them. The engineers who could answer them are usually not in the room when the strategy is defined.
That gap between slide and system is where most AI strategies stall.
The “Expensive Junior” Paradox#
The issue is not a shortage of smart people in large firms. It is a mismatch between the skills they sell and what AI work needs.
Effective AI implementation requires cross-domain depth:
Someone who can discuss P&L and capital budgeting with the Board.
Who understands AI Act obligations, sector rules, and GDPR intersections.
Who can redesign workflows for human–AI collaboration, not just draw an organisation chart.
Who can judge whether a proposed architecture can actually deliver the SLA and control requirements.
Who has seen production systems fail and had to fix them.
You do not get this profile from a single discipline. It comes from a career that has moved through software delivery, advisory work, and operational responsibility.
Large firms staff by function: a strategy team, a tech team, a change team. Each optimises its own deliverables. Handoffs between them add latency and information loss. No one is accountable for the end-to-end path from principle to working control.
“Expensive Junior” here is not about age. It is about how many layers a person can work across without a translator. A senior partner who has never had to keep a system alive in production is junior on the architecture and operations layers. A strong ML engineer who has never owned a regulatory finding or sat in front of a risk committee is junior on governance and business layers. The seniority that matters is integration, and it is rare.
When you pay senior rates for someone who is confined to a single layer, you are paying senior prices for junior leverage.
Process Is the Product#
Enterprise AI is often framed as a technology acquisition problem. It is mostly a process design problem.
AI does not repair broken processes. It makes their failure modes very visible. Adding an LLM to a legacy workflow amplifies whatever is already there:
Models hallucinate because the corpus is inconsistent and nobody defined which source wins.
Chatbots fail because exception paths were never written down.
“Automation” increases workload because the human–AI handoff is undefined, so people re‑do the work by hand.
In classical software, you could sometimes automate a poor process and rely on rigid logic to hide inconsistencies. Probabilistic systems behave differently. They probe the edges.
Designing the human–AI boundary is therefore core work, not decoration:
Who decides when AI output is accepted as-is?
What is the fallback path for low-confidence or out-of-distribution cases?
How are exceptions routed, logged, and reviewed?
Where does judgment sit, and how is that time protected?
Traditional consulting delivers “target operating models” as slides. Useful AI advisory specifies concrete flows: the confidence threshold, the exact escalation path, the log fields an auditor will ask for.
If your AI strategy never gets down to this level, it is not yet an implementation plan.
A Simple Test for Advisors#
Before you sign the next AI engagement, ask candidates three practical questions.
1. “Describe an AI project where the strategy you worked on failed in production. What broke?”
You want a specific project, a clear failure mode, and what changed afterwards. Answers that stay at the level of “client execution issues” usually indicate distance from delivery.
2. “How would you design the human override for a low-confidence prediction in a critical use case?”
A useful answer talks about workflow, thresholds, UX, logging, and responsibility — not just “we add a review step”.
3. “When would you choose Human-in-the-Loop versus Human-on-the-Loop, and why?”
HITL means the human makes the decision with AI support. HOTL means the system acts and the human supervises. The choice depends on risk, reversibility, and regulatory expectations. If they cannot articulate that, they do not yet own the governance layer.
Vague or purely theoretical responses across these three questions are a signal. You are likely talking to a strategist or a technologist, not an integrator.
Briefing: The Environment Around You#
Reasoning Models and Energy Use#
Recent work from the AI Energy Score project compared energy consumption across dozens of models. On average, reasoning-enabled models used around 100 times more power to answer the same set of prompts than stripped-down alternatives.
In extreme cases, compact model variants (DeepSeek R1, Microsoft’s Phi-4) needed around 20–50 Wh with reasoning disabled and 7–13 kWh per 1,000 prompts with reasoning enabled.
The gap comes from longer chains of generation and more computation per token.
Operational takeaway: routing everything through “the smartest model” is a cost and capacity decision, not just a quality decision. The right control point is not “Do we use AI?” but “Which class of model is appropriate for this task?” Simple summarisation, retrieval, and classification jobs should run on cheaper, smaller models. Reserve heavy reasoning for domains where it changes the decision.
Sales Targets and Reality#
Reports from multiple outlets, including Ars Technica’s coverage of internal Microsoft targets, suggest that AI software sales quotas have been cut roughly in half in some units after repeated misses. Azure AI Foundry and related offerings are not scaling at the clip implied by marketing narratives.
The infrastructure exists. The commercial push is intense. Yet enterprises hesitate to move from pilots to core workflows.
Operational takeaway: I see a couple of factors at play. First, this is further evidence that the main constraint is not availability of models or platforms — without proper process redesign and integration into existing systems, licences will sit unused. Second, competition in the market is intensifying and Microsoft is no longer perceived as the clear AI leader.
ROI, Verification, and the Cost of “Checking the Machine”#
IBM’s recent data point that only about a quarter of AI initiatives have met ROI expectations, with just 16% of companies successfully scaling AI applications, is consistent with what boards are now seeing in their own portfolios. A small minority of programmes scale; many stall in pilot or remain permanently “experimental”.
A large share of the cost sits in verification. Senior staff spend time reviewing AI output because workflows were not redesigned to absorb machine error rates. You pay for the model and for additional human review.
Operational takeaway: unless you redesign processes to reduce verification load or to use verification effort more strategically, the apparent productivity gain is illusory. Cost is simply moved from one line item to another, often towards your most expensive people.
The Monday Question#
Before you approve the next AI line item, ask this on Monday morning:
“Who is responsible for translating the strategy into technical constraints and back into a business case — and do they own all five layers?”
If that work sits in a gap between teams — if the “strategy people” and the “delivery people” rarely sit in the same working session — you have located the risk.
The missing role in most enterprises is the person who can move fluently between the boardroom and the build pipeline, and is accountable for both story and system. Without that role, you buy expensive juniors in every layer and wonder why nothing ships.
Until next time, build with foresight.
Krzysztof
