Dear Reader,
In our last issue, we explored the unsettling question of what an “AI system” truly is in the eyes of a regulator. The answer is often found not in the marketing brochure, but in the fine print of the law. This week, we move from the engine to the fuel. It is a far less glamorous topic, but arguably the most important one in any serious discussion about enterprise AI. We are talking about data.
Too many organisations are attempting to build an AI skyscraper on a swampy foundation. They are captivated by the flashy architecture of the latest models, without paying attention to the unsexy, foundational work of data governance.
The Briefing#
The Market Signal: Washington’s Great Abdication#
The biggest story this past fortnight was not a new technology, but the sound of a balloon popping in Washington, D.C. A much-debated plan to impose a ten-year federal moratorium on state-level AI laws—a central plank of the “One Big Beautiful Bill Act”—was unceremoniously killed by the US Senate in a near-unanimous 99-1 vote. The lobbyists for Big Tech, who craved the simplicity of a single federal rulebook, lost.
The obvious take is that this is regulatory chaos. The non-obvious one is that this is a market correction. Washington has, perhaps accidentally, done the wisest thing possible: it has abdicated. It has ceded control to the states, unleashing what some are calling a “regulatory gold rush”. States like Texas, California, and Colorado are now competing laboratories, stress-testing different governance models in the real world. For leaders like you, this isn’t a headache; it’s an opportunity. It forces the development of adaptable, resilient governance frameworks, not brittle ones built for a fantasy world with one rule. You can read more about the moratorium’s demise here.
The Pragmatic Angle: Europe’s Engineering Edict#
While America embraces regulatory federalism, the EU has gone in the opposite direction, but not in the way you might think. This week, the European Commission released its General-Purpose AI Code of Practice, a voluntary guide to complying with the formidable EU AI Act.
This document is an engineering specification. It reads less like a legal text and more like a detailed blueprint for building a safe machine. It mandates specific, auditable controls: multi-factor authentication, frequent red-teaming, insider threat checks, and even physical data centre security. This comes closer to the philosophy of “Governance-as-Code”. The EU is not debating the nature of consciousness; it is treating AI risk as an industrial engineering problem that can be solved with rigour, process, and a healthy dose of paranoia.
The Technological Angle: The Rise of the Robot Red Team#
While policymakers debate rules, engineers are building the tools to enforce them. The most significant technological innovation for governance isn’t a better model, but a better way to break it. We are seeing the emergence of automated and continuous “AI red teaming” platforms designed to constantly attack a company’s own AI systems to find flaws. Services like Straiker’s new “Continuous Ascend AI” promise to run 24/7, testing live applications and alerting developers the moment a vendor’s model update inadvertently weakens their defences.
This shifts the standard of care from a one-off, pre-deployment audit to a state of perpetual, automated vigilance. The question for leaders is no longer, “Did you test the AI?” but, “Is your AI constantly testing itself?” You can read more about these new services here.
The Hollow Core of AI: Understanding the Potemkin Village Problem#
Two recent analyses illuminate why data quality is so crucial. Authors of this article: https://arxiv.org/html/2506.21521v1 coined the term “Potemkin Understanding” to describe a disturbing phenomenon: LLMs producing confident-sounding analyses that are fundamentally hollow. Like the fake village facades constructed to impress Russian Empress Catherine, these models create an illusion of comprehension while lacking genuine understanding.

The paper demonstrates how LLMs can generate detailed, eloquent explanations about fictional entities or fabricated data with the same confidence they apply to real information. This is particularly dangerous in enterprise settings where decision-makers may not recognize when an AI is essentially “hallucinating with conviction” based on faulty data inputs.
This aligns with Gary Marcus’s critique of generative AI’s “crippling failure to induce robust models of the world.” Marcus argues that current AI systems lack the causal understanding that humans develop through physical interaction with reality. Instead, they build statistical approximations based solely on text, creating fundamental blind spots that no amount of parameter scaling can overcome. https://garymarcus.substack.com/p/generative-ais-crippling-and-widespread
The implications for enterprise are stark: an LLM might be a Potemkin village—impressive from a distance but hollow upon inspection. Models can be trusted to produce good results, if they have an internal model of the system they are tasked to support or optimise. LLMs do not.
As Marcus concludes, “Today’s LLMs remain statistical systems without genuine understanding of their inputs or outputs.” This is why data foundation work isn’t optional—it’s the difference between AI that truly augments human intelligence and AI that merely creates a convincing illusion of competence.
I’m working on a longer piece on that topic, as the disconnect between reality and hype, as well as people’s belief in magic is fascinating to me.
The Unsexy Bedrock of AI Success#
Data governance is the janitor’s closet of the AI world. It’s not a topic that gets celebrated in press releases or featured in breathless keynote presentations. It’s the quiet, diligent, and often tedious work of ensuring your data is clean, organised, secure, and fit for purpose. And just like a well-maintained building, without it, everything else eventually falls apart.
The current hype cycle encourages leaders to focus on the flashy use cases—the customer-facing chatbots, the AI-powered market predictors. But this is a dangerous misdirection. As Ethan Mollick and others have pointed out, the first wave of real, sustainable value from AI in the enterprise will likely come from internal, practical applications that improve efficiency and reduce tedium. The problem is that these practical applications rely on high-quality internal data, which is often a chaotic mess.
An AI model, no matter how advanced, is a powerful but literal-minded engine. It does not possess a “world model” or what we might call common sense. It cannot magically discern that the data from the "Q3_Sales_Final_v2_Johns_Copy.xlsx" spreadsheet is more reliable than the data from the official but outdated CRM. It will simply process what it is given. This is the concept of “Potemkin Understanding”: an LLM can generate a fluent, confident-sounding analysis based on flawed data, creating a convincing illusion of insight that is dangerously wrong.

The Seven Deadly Sins of Enterprise Data#
Before you can build, you must understand the common points of failure.Most enterprise AI projects that fail do so not because the model is flawed, but because they fall victim to one or more of these foundational data pitfalls.
Poor Quality: This is the most common sin. It includes everything from missing fields and incorrect entries to inconsistent formatting. An AI trained on this data will learn these imperfections and amplify them, producing unreliable outputs with unshakeable confidence.
Hidden Bias: Your historical data is a reflection of past decisions, including past biases. A loan approval model trained on decades of biased lending data will not magically become fair; it will simply become a highly efficient engine for perpetuating that same bias.
Data Silos: The most valuable insights often come from connecting disparate datasets—linking customer support data with sales data, for example. In most organisations, this data lives in separate, jealously guarded silos, making a holistic view impossible.
Insecure Handling: The rush to experiment often leads to teams taking shortcuts, like uploading sensitive customer data to a third-party AI platform without proper security reviews, creating a massive compliance and privacy risk.
Lack of Provenance: Where did this data come from? Who has touched it? Do we have the right to use it for training an AI? Without a clear chain of custody (provenance), you cannot prove to a regulator—or yourself—that your data is compliant.
Mismatched Context: Using data for a purpose for which it was not intended. For example, using customer service chat logs, which are full of informal language and abbreviations, to train a formal report-writing AI will lead to bizarre and unprofessional results.
“Dark Data”: This is the vast ocean of unstructured data your organisation collects but doesn’t use—emails, PDFs, meeting transcripts. It’s a potential goldmine, but accessing and preparing it for AI is a significant engineering challenge that is often underestimated.
“Engineering Thinking” for AI-Ready Data
The solution to these problems is not to buy another piece of software. It is to adopt a different mindset: a pragmatic, engineering-led approach to data. This means treating your data pipelines with the same rigour you apply to building a bridge or a power grid. Three simple principles are key:
Data Must Be Clean: This means establishing automated, repeatable processes for data cleansing, validation, and enrichment. It’s not a one-time task; it’s a continuous process, like maintaining a clean water supply. The goal is to have a “single source of truth” for your most critical data domains.
Data Must Be Contextual: Data without context is just noise. Every critical dataset should be accompanied by a “data card” or metadata that clearly explains its provenance, its intended use, its known limitations, and its owner. This makes it possible for both humans and AI systems to use the data correctly.
Data Must Be Controlled: Access to data, especially for training AI models, must be governed by strict, role-based controls. This isn’t just about security; it’s about ensuring the right data is used for the right purpose. This requires a cultural shift from “data ownership” by departments to “data stewardship” on behalf of the entire enterprise.
The Cultural Shift: From Data Hoarders to Data Stewards#
This engineering approach cannot succeed without a corresponding cultural shift. In many organisations, data is treated as a private fiefdom. The marketing department “owns” the customer data; the finance department “owns” the transaction data. This mindset is the single biggest obstacle to creating an AI-ready enterprise.
Becoming an “AI-first” enterprise, as Ethan Mollick suggests, requires a radical change. It means treating data as a shared, enterprise-wide asset. It requires creating new roles—not just data scientists, but Data Curators and AI Data Stewards whose job it is to ensure the quality, context, and security of the data that the entire organisation will use.
This is a leadership challenge. It requires you to champion the unglamorous, foundational work of data governance. It means rewarding the teams who clean up data silos just as much as you reward the teams who build flashy new models.
An Actionable Framework: The Data Readiness Assessment#
To help you start this conversation, here are four questions to ask your leadership team. They are designed to reveal how ready your organisation truly is for AI at scale.
The “Single Source of Truth” Test: If I asked for a definitive list of our top 100 clients, how many different answers would I get, and how long would it take to reconcile them?
The “Bias Audit” Question: What process do we have to actively audit our historical data for the hidden biases that could poison our AI models? Who is responsible for signing off on a dataset as “fair enough” for use?
The “Data Provenance” Challenge: Can we trace the full lineage of the data used by our most important predictive model, from its source to its final input? Could we prove this to a regulator?
The “Janitor’s Closet” Budget: How much are we investing in the foundational work of data cleansing, integration, and governance, compared to how much we are investing in experimental AI models? Is the balance right?
Preview of the Next Issue#
Getting your data foundation right is the essential first step. But once you have clean fuel, you still need to govern the engine. In our next issue, we will explore the emerging challenges of “Agentic AI”—what happens when AI starts to act on its own, and how we can ensure we remain in control.
Until then, lead with foresight.
Krzysztof
