We know the AI failure statistics by heart. Most GenAI projects never leave pilot. Those that do rarely deliver measurable value. The rest linger in limbo — burning budget until someone finally asks “why are we doing this?”
This is the final issue in the scaling series. Time for a summary and conclusions.
Why Standard Checklists Are Not Enough#
The typical response to pre-launch anxiety? A longer checklist:
- Security — checked
- Infrastructure — ready
- Monitoring — configured
- Incident procedures — documented All necessary. But for AI systems in regulated industries — not sufficient.
Classic Go/No-Go lists assume a deterministic world: requirements don’t change, production data looks like test data, people follow the script. AI breaks all three assumptions. It produces probabilistic outputs, drifts over time, changes processes around it.
You can tick every box and still fail to deliver a working solution.
The problem is that production readiness is a governance question, not another Jira field. The checklist must help analyse the same risks we’ve discussed throughout this series — and merge them into a single decision.
I’ll say it again: If a governance rule can’t automatically block a deployment, it’s not a control. It’s a suggestion.
Three Risks You Must Verify#
Over the past issues, we mapped the causes that kill AI projects between pilot and production. Before go-live, all three need verification.
1. Data vs. Reality#
In testing, your model saw clean, well-documented data. In the call centre, it will see half-completed forms, customers switching languages mid-sentence, and procedures changed last quarter without proper documentation.
In Issue #26, we called this the gap between training data and production. You can’t bridge it with good intentions. You need evidence that the system can handle it.
Case in point: Epic Systems deployed a sepsis detection model across hundreds of hospitals. In retrospective testing, it looked great. In daily clinical practice — where data is entered chaotically and with delays — the model missed 67% of sepsis cases while generating so many false alarms that doctors learned to ignore it.
2. Automating Chaos#
In Issue #27, I showed what happens when you automate an inefficient process: the chaos starts moving faster. In production, AI doesn’t operate in a vacuum. It changes task flows, escalations, decision-making authority.
Before go-live, you need three things: a process map with a clearly marked place for AI, a person responsible for the whole thing (including when the system fails), and a path back to manual work. Without this, “production readiness” just means “ready for faster chaos.”
Case in point: Zillow’s property-buying algorithm was trained on data from a rising market. When prices started falling, the model kept buying — at inflated prices. Zillow lost over $500 million and shut down the division. There was no human-in-the-loop — nobody checked whether the model’s outputs made sense.
3. ROI on Slides#
In Issue #29, we built an ROI scorecard: a way to measure what actually matters after the system goes live.
Before go-live, you need to know two things: what you expect from the system (specifically, measurably) and where you’ll get the data to verify it. And that source probably shouldn’t be a PowerPoint deck prepared by Big4 consultants. Otherwise, you’ll have an AI system whose benefits are indefensible when the CFO asks how much you earned or saved.
Case in point: IBM Watson for Oncology was deployed in cancer centres worldwide. MD Anderson alone invested over $60 million. The presentations looked impressive. Media wrote about a breakthrough. But when they verified whether Watson actually improved treatment outcomes compared to standard therapy — there was no evidence. The project was quietly shelved.
The Production Readiness Checklist#
Ten questions you can cover in a single board meeting. For each one — evidence that should exist before the “Go” decision.
1. Owner#
- Question: Who takes responsibility for this system in production?
- Evidence: A named person on the business side and IT side, with documented scope of responsibility.
2. Place in Process#
- Question: What processes does this system support, and who is responsible when the system stops working?
- Evidence: Process map (current and target state), human checkpoints, path back to manual work.
3. Real-World Data#
- Question: Was the system tested on production-like data, not just PoC data?
- Evidence: Test results on production data — including edge cases and incomplete or noisy data.
4. What Can Go Wrong#
- Question: Do we know how the system can fail? Do we know what we do then?
- Evidence: Documented failure scenarios, red-teaming results, response procedure.
5. Kill Switch#
- Question: How do we shut down the system when it stops meeting quality criteria? What happens immediately after?
- Evidence: Working kill switch, tested rollback, at least one completed drill.
6. Dependencies#
- Question: What does this system depend on — components, APIs, prompts, data sources? Who is responsible for what?
- Evidence: Dependency list with owners and repository locations.
7. How We Know Something Is Wrong#
- Question: What will immediately alert us that the system is misbehaving? “Immediately” depends on process characteristics — could be milliseconds or days.
- Evidence: Defined early warning indicators, alerts reaching specific people, support rotation plan.
8. Is It Worth It#
- Question: How will we measure whether the system is worth maintaining?
- Evidence: ROI scorecard fed by real data.
9. What We Show the Auditor#
- Question: When the auditor asks “why did the system make this decision?” — what do we show them?
- Evidence: Documentation of system decision scope, input/output logs, simple description of how it works. This is where SR 11-7 (Fed), EU AI Act high-risk requirements, and NIST AI RMF converge. In regulated industries, auditors will eventually ask.
10. Dress Rehearsal#
- Question: Did we go through this list with the people who will be responsible for the system?
- Evidence: Meeting notes: open risks, assigned owners.
From Checklist to System#
If you go through this list once and file it away — you have a manual process based on goodwill. If you build it into your deployment pipeline — you have governance. The difference: goodwill-based processes work as long as someone remembers. Systems work always.
Mature organisations build five pillars:
- Feature Store — single point of truth for model-ready data. Eliminates differences between what the model saw in training and what it sees in production.
- Model Registry — version control for models with full history: training data, code, validation results.
- AI Gateway — central control point for all model traffic. Rate limits, PII anonymisation, access policies — in real time.
- Observability Stack — drift detection, quality monitoring, alerts for problems invisible in standard metrics.
- Policy Engine — the Governance-as-Code engine. Rules execute automatically in the pipeline. Today, most of you will go through this list manually. Over time, most items can be automated. The goal is not to replace human judgement — just to make sure people focus on the right questions, not waste time on things a machine does better.
What’s Next#
This issue closes the first cycle of The AI Equilibrium. I started this series thinking about technology. I ended up writing about decisions — who makes them, on what basis, and what to do when they turn out wrong.
This list won’t win any beauty contests, but you can show it to the board and defend it. And that’s what production readiness is about: not perfection, but being able to justify, verify, and defend the decision.
In upcoming issues, we’ll take this list and apply it to specific industries — credit decisions, contact centres, insurance, public services. Case by case: how to go from theory to practice.
The Briefing#
No governance = no ROI#
Smarsh report on AI in finance: only 32% of firms have a formal AI governance programme. The common denominator of poor AI results isn’t bad models — it’s missing operational frameworks. That’s why this checklist exists.
Your firewall won’t protect your AI#
Harvard Business Review reports that traditional IT security doesn’t cover AI-specific threats: prompt injection, training data poisoning, model-specific exploits. The article cites a June 2025 Microsoft 365 Copilot vulnerability that exposed corporate data. AI risk requires a separate approach — threat modelling, red-teaming, dedicated monitoring. Items #4 and #7 on the checklist are there for this.
EU AI Act is coming#
Scalevise guide describes what regulators will require in 2026: AI system inventory, documented decision logic, oversight procedures, continuous monitoring. High-risk rules take effect August 2026. In regulated industries, inventory and auditability are no longer optional — it’s a gap the auditor will find for you.
This Week’s Question#
Before your next go-live, gather the decision-making team and go through this list together. Seriously, not as a formality.
How many of these ten questions can you answer today with evidence — not assumptions?
If fewer than seven: you now know what’s blocking your deployment. And it’s probably not the model.
Stay balanced,
Krzysztof
