Dear Reader,
Last week, we discussed the unglamorous truth that an AI is only as good as the data it eats. This week, we move from the input (data) to the engine itself. We’ll tackle two questions. First, what, precisely, is an AI system in the eyes of a regulator? And second, as we confront ever more complex models, how can we possibly trust a “black box” that cannot explain itself? The answers, you may find, are less about technology and more about legal philosophy, with very expensive consequences.
The Briefing#
On one hand, regulators are clarifying rules for the present. On the other, the market’s biggest players are racing towards a bespoke, high-touch future—a clear admission that generic AI is not enough.
First, for our Polish readers, a new draft of the national law implementing the AI Act has been published. The proposed “Commission for AI” has been streamlined, and its legal opinions on specific AI systems will now be binding on other administrative bodies. For business leaders, this means the process for getting a definitive ruling on a new AI system should become more predictable. Previously, a business could receive a favourable opinion from one authority, only to be challenged by another—a situation that makes any significant investment untenable. This change provides a degree of legal certainty required for capital investment. The clarification of rules for regulatory sandboxes further supports this, creating a safer space for practical experimentation.
Meanwhile, the market is shifting away from the “one-size-fits-all” model. Mira Murati, former CTO of OpenAI, has launched her new company, TML to create highly customised AI for enterprises. The key detail is her focus on “Reinforcement Learning for Business.” This approach trains models to optimise for specific, hard business goals like maximising profit margins or improving customer retention, rather than simply predicting the next word. It is a direct attempt to solve the problem that generic LLMs do not understand business objectives and often produce plausible but commercially useless output.
This pivot is echoed by OpenAI itself. The company has launched a high-end consulting arm, deploying its own engineers to build bespoke systems for clients, with a reported minimum engagement of $10 million. This move into high-touch services, mirroring Palantir’s model, is a powerful admission: unlocking the real value of AI requires deep, hands-on integration, not just access to a powerful API.
That’s nothing new for people who have been working in the IT market for some time — each new systems generation starts with perfect “out-of-the-box” products that promise a much simpler world, and inevitably ends up with a much more complex world of customised and bespoke made processes and enterprise software. Why is that true for generative AI as well? Of course, because it has its own limitations. For starters, LLMs lack a true “world model.” They are masters of statistical mimicry, capable of creating a convincing facsimile of strategic thought—a kind of “Potemkin thinking.” The move towards deep customisation and reinforcement learning is an attempt to build a proxy for that missing understanding. We will explore this concept of “Potemkin AI” in more detail in Issue #4.
The AI Act’s Definition: When a Statistic Becomes an AI#
While the market looks to the future, regulators are busy defining the present. The EU AI Act forces us to solve a fundamental problem: what, exactly, is an “AI System”? The Act’s definition is based on characteristics like autonomy and adaptiveness, deliberately distinguishing it from “simpler traditional software systems.” This distinction is not academic. For a bank, it is a multi-million-euro question.
Consider credit scoring. For years, banks have used standard logistic regression models. A “pure” version of this, with manually selected variables and fixed coefficients, falls outside the AI Act’s scope. It lacks the autonomy and adaptiveness the law specifies. Think of it as a sophisticated but static calculator; it does the same calculation the same way every time. However, the moment you automate this process—for instance, using algorithms for feature selection or periodically recalibrating the model automatically—it likely crosses the threshold and becomes an AI system. Its behaviour is no longer static; it learns and adapts. Now consider more modern techniques like gradient boosting machines (XGBoost). These are currently classified as AI systems. Their entire design is based on learning and inference, building ensembles of hundreds of smaller models to iteratively correct their own errors. However, industry experts are in discussions with the lawmakers to relax the definition of AI system so that fewer of existing technologies fall into that category. If they succeed, the impact of new regulations on large enterprises will be smaller.
This matters immensely because Annex III of the AI Act explicitly lists AI systems used “to evaluate the creditworthiness of natural persons” as a high-risk use case. The logic is simple and brutal: if your credit scoring model is technically an “AI System,” it is automatically designated “high-risk,” triggering extensive compliance obligations due by August 2026.

The New Black Box: From Opaque Models to Opaque Prompts#
Just as we begin to grapple with the explainability of statistical models, the rise of Large Language Models (LLMs) presents a new, even more profound “black box” problem. For an XGBoost model, we can at least use techniques like SHAP to identify which input features most influenced the outcome. For an LLM, this is often impossible. The challenge shifts from explaining the model’s internal mechanics to ensuring the reasoning process is transparent and auditable. This brings us to the hidden but critical risk area for the modern enterprise: the governance of prompts and their context. In this new paradigm, the prompt and context is the new source code. An improperly governed prompt can have serious consequences:
Security Risk: A user could inadvertently paste sensitive customer data into a prompt sent to a third-party API, creating a data leak.
Compliance Risk: An improperly constrained model could generate advice that violates financial regulations.
Operational Risk: Inconsistent prompting across teams can lead to wildly different outputs, creating operational chaos. Effective governance means treating your library of approved enterprise prompts as a valuable, controlled asset. But how do we make this new type of “code” explainable? The answer lies in engineering the prompts themselves to force transparency.
Making the LLM “Show Its Work”#
We can achieve a high degree of interpretability for LLMs by using two key techniques:
Chain-of-Thought (CoT) Prompting: This is the most direct method. Instead of just asking for an answer, you explicitly instruct the model to outline its step-by-step reasoning before giving the final conclusion. A standard prompt might ask, “Is this customer eligible for a refund?” and get an unauditable “Yes.” A CoT prompt instructs it to first summarise the issue, then cross-reference the relevant policy, state its reasoning based on that policy, and then provide the final answer. The opaque black box is forced to produce its own audit trail.
Retrieval-Augmented Generation (RAG): This is the most important technique for any regulated industry. RAG prevents the LLM from “making things up” by forcing it to base its answers exclusively on a pre-approved, trusted set of documents you provide. When a user asks a question, the system first finds the most relevant documents in your internal knowledge base and instructs the LLM: “Answer the user’s question using ONLY the following information.” A well-designed RAG system doesn’t just give an answer; it provides citations, telling you, “I believe the answer is X, and I based this on information found in
document_A.pdf(page 4).” It transforms the AI from an unreliable oracle into an efficient and auditable research assistant.
⠀
Questions for Your Leadership Team#
This new reality demands a new set of questions that bridge the technical, the legal, and the operational.
What is in our “Model Inventory”? Do we have a comprehensive list of all models used in credit scoring, and have we formally assessed each one against the AI Act’s definition?
Where is our “Regulatory Red Line”? Have we defined a clear internal policy on which modelling techniques are acceptable for specific use cases, considering the compliance overhead?
Who Governs Our Prompts? Do we have a formal process for creating, approving, and managing the prompts and context used with our generative AI tools, especially for customer-facing or high-risk functions?
Is Our “Explainability” Legally Defensible? It’s not enough to say a model is explainable. Can we produce documentation for our CoT and RAG methods that would satisfy a regulator’s scrutiny for a high-risk system?
⠀
The twin challenges of defining older AI and governing newer AI are a perfect illustration of our central theme. They are complex, high-stakes issues where engineering reality, regulatory philosophy, and business pragmatism collide. Navigating them successfully requires moving beyond the hype and engaging with the details. In our next issue, we will delve deeper into the concept of “Potemkin AI” and explore the practicalities of data governance as the non-negotiable bedrock for any successful and responsible AI strategy.
Until then, lead with foresight.
All the best, Krzysztof
