Management boards make decisions on artificial intelligence projects, but few have formalised its oversight. While over 60% of directors consider AI a routine topic, only 35% have integrated it into committee charters or risk frameworks. This gap between awareness and action creates a risk and legal vulnerability. The most significant one is not a single AI failure, but the documented disparity between high awareness of AI’s importance and the low formalisation of its control. Corporate law across jurisdictions imposes a duty of care on directors, requiring them to make a good-faith effort to establish reasonable information and reporting systems for mission-critical risks. This principle, once applied mainly to financial controls, now extends to systemic threats like cybersecurity. AI, which underpins core business functions from credit scoring to supply-chain management, falls squarely into this category. A documented failure to formalise AI oversight—for example, the absence of AI-specific language in a risk committee’s charter—constitutes a potential breach of this fundamental duty.
As we explored in Issue #9, the stewardship duties of leadership are expanding. The legal protections afforded to directors for their business decisions typically apply only when those decisions are informed and made with procedural prudence. A board that cannot demonstrate a structured, repeatable process for understanding and monitoring AI risk forfeits this defence. It is not enough to discuss AI; the board must have a formal system for doing so.
The problem is one of translation. Technical leaders report on system performance, using metrics such as algorithmic precision or recall. Boards, charged with governing the entire enterprise, need risk quantified in economic and strategic terms. A Chief Technology Officer might report, correctly, that a fraud detection model is 99.5% accurate. A director’s proper response is not to be reassured, but to ask what the 0.5% of failures represents in terms of financial loss, customer disruption, or regulatory fines. The goal, therefore, is not to make company directors into data scientists. It is to provide a reporting framework that demonstrates control, making the communication itself the evidence of good governance.
The Briefing#
The prevailing strategy in artificial intelligence has equated leadership with scale, measured in model parameters and processing power. This has implied that the future belongs to the few firms with the capital for such investment. Recent technical findings and industry discussions challenge this assumption. One exposes a security risk that scale worsens; the other suggests efficiency, not size, will determine the next competitive advantage.
The first topic concerns the integrity of the AI supply chain. It was thought that “data poisoning”—corrupting a model by inserting malicious examples into its training data—was a prohibitively expensive attack, requiring an adversary to control a significant fraction of a vast dataset. Research from Anthropic, the UK’s AI Security Institute, and The Alan Turing Institute has shown this to be false.
Their work demonstrates that the number of malicious documents needed to install a hidden “backdoor” in a model is a near-constant, absolute number, regardless of the model’s size. As few as 250 poisoned documents were sufficient to compromise models with between 600 million and 13 billion parameters. For the largest model, this represents just 0.00016% of its training data.
Most foundational models are trained on data scraped from the public internet. An adversary no longer needs privileged access to a vendor’s infrastructure; they need only to publish a few hundred tainted documents online and wait for web crawlers to collect them. An organisation can deploy a model from a trusted vendor, which has passed all standard evaluations, yet still contain a hidden vulnerability. Increased scale makes the malicious data harder to find, not less effective.
The second article challenges the dominance of large models on economic and operational grounds. The current approach to building “agentic AI”—systems that perform tasks autonomously—often uses a single, large language model (LLM) for every step of a process. A recent paper from NVIDIA researchers argues this is inefficient.
They contend that the future of agentic AI belongs to small language models (SLMs). Most tasks an agent performs, such as formatting an output or calling a specific tool, are simple and repetitive. SLMs are sufficient for these jobs and are more efficient. The paper advocates for “heterogeneous” systems, where a fleet of specialised SLMs handles the bulk of the work, and a single LLM is used only for high-level strategic direction. A SLM can be 10 to 30 times cheaper to operate than a frontier LLM. Its smaller size also makes it easier to fine-tune for specific corporate tasks.
My thinking is that for most tasks that are executed in a process, can, and should be performed by a deterministic algorithm, based on business rules, especially if the process is run at scale. It is still much cheaper and safer to limit agents’ applications to tasks that can’t be effectively run with ‘traditional’ business rule approach.
An AI Risk Dashboard for the Board#
A board’s role is to set risk appetite and monitor adherence to it. Effective reporting on AI must therefore translate technical complexity into business signal, focusing on impact, not mechanical detail. The principal risks from AI stem not from its accuracy in isolation, but from its speed and scale when deployed. A model making one million automated decisions a day has a profoundly different risk profile from one that assists a human with one hundred. The most useful metrics are those that measure this operational tempo and its potential consequences.
A board-level dashboard should concentrate on four key indicators:
Automated Decision Velocity (ADV): This measures the number of consequential, automated decisions made per unit of time without human review. It is the clearest proxy for the scale of operational risk. A rising ADV signals that more of the business is running on autopilot, increasing its exposure to systemic failure. A single flaw in a widely used model can be amplified at algorithmic speed, turning a minor error into a significant crisis in minutes, not days. This metric gives the board a tangible sense of “decision leverage”—the ratio of automated judgments to human supervisors. This metric is especially important once we start implementing AI agents, and remember that in multi-step processes the probability of reaching a successful completion of a process decreases exponentially with number of steps.
Model Risk Appetite Adherence: This tracks the percentage of high-impact AI models operating within the company defined risk parameters. Defining this appetite is a crucial governance function. It goes beyond simple accuracy floors to include specific thresholds for fairness (e.g., demographic parity in loan approvals), explainability requirements for decisions with legal consequences, and operational boundaries (e.g., a trading algorithm is not permitted to execute trades above a certain value without human sign-off). This metric connects the operational reality of AI systems directly to the board’s strategic directives. It shifts the question from a technical one (“Is the model accurate?”) to a governance one (“Is the model compliant with our stated tolerance for risk?”).
Data Provenance Score: This is a composite score for the quality, integrity, and auditability of data feeding critical AI models. AI risk is fundamentally data risk. Flawed, biased, or unlicensed training data creates significant downstream legal and reputational liabilities. This score provides a simple health check on the foundation of the entire AI ecosystem. It should be a weighted average of several factors: data lineage (can we trace the data to its source?), quality checks (is it complete and consistent?), licensing rights (do we have the legal right to use it for this purpose?), and bias assessments. It addresses the “garbage in, gospel out” problem and answers a fundamental question: “Can we trust the data our models are learning from?”
Shadow AI Exposure: This measures the percentage of AI tool usage within the firm that is unsanctioned or unmonitored by IT. The proliferation of browser-based generative AI tools creates a significant blind spot for data leakage and compliance risk. Employees using unvetted public tools for tasks like summarising sensitive documents or writing code can lead to the inadvertent disclosure of intellectual property. Quantifying this activity, perhaps through network traffic analysis or software audits, is essential for the board to grasp the organisation’s true AI footprint and its associated unmanaged risks.
⠀
A Lexicon for AI Threats#
Understanding AI threats requires a precise vocabulary. The following terms define common failure modes and attack vectors in technical, business-relevant language.
Model Drift: This occurs when a model’s predictive accuracy deteriorates as the new, live data it processes begins to differ from the data it was trained on. A model trained on pre-pandemic economic data, for example, will become unreliable when forecasting post-pandemic trends. It is a silent failure mode where performance degrades, leading to progressively poorer business decisions, from inaccurate inventory forecasts to flawed credit risk assessments.
Data Poisoning: This is an attack that corrupts a model by introducing malicious data into its training set. The objective is to create a ’trojan horse’ within the model, causing it to fail in specific ways that benefit an adversary. For instance, an attacker could poison a spam filter’s training data to ensure their malicious emails are always classified as legitimate. This is a supply-chain attack on the AI, compromising its integrity from the source.
Prompt Injection: A vulnerability in large language models (LLMs) where an attacker crafts an input that subverts the model’s original instructions. This can trick the model into ignoring its safety protocols, revealing confidential data it was trained on, or executing unintended commands on integrated systems. It is the AI equivalent of a command-injection attack, exploiting the model’s interpretation of natural language to hijack its function.
Overfitting: A modelling error where the AI learns its training data too precisely, memorising its statistical noise rather than the underlying, generalisable patterns. The result is a model that appears highly accurate in testing but fails to perform on new, real-world data. An overfitted model produces deceptively optimistic back-testing results but is fragile and unreliable when deployed, making it dangerous for forecasting or real-time decision-making.
The Business Case for Control#
Securing budget for governance requires reframing it from a compliance cost into a strategic enabler. Proper AI governance accelerates, rather than hinders, enterprise-wide adoption. It is the underlying infrastructure required to scale innovation safely and efficiently. The business case rests on three pillars:
An Investment in Brand Trust: In a digital economy, trust is a balance-sheet asset. Governance is the mechanism for ensuring AI systems are fair, transparent, and reliable. A bank whose loan-approval AI is certified as fair by auditors will build more trust with customers and regulators than a rival whose model is a black box. This trust translates directly into commercial advantage, including higher customer loyalty and a lower cost of capital.
An Investment in Innovation Velocity: Clear guardrails empower teams to innovate with confidence. A central governance framework provides common standards for risk assessment, data validation, and model monitoring. This is analogous to building a modern factory floor. Once the infrastructure—safety protocols, quality control, supply lines—is in place, new products (AI models) can be developed and launched much faster and more reliably. It prevents “pilot purgatory,” where promising AI projects never scale because the foundational risks have not been addressed.
An Investment in Director Protection: This is the most critical pillar. A documented governance framework, supported by a board-level dashboard, is the most tangible evidence that directors are fulfilling their duty of care. It is not just a shield in litigation; it is an affirmative demonstration of competence. This investment directly mitigates the personal and corporate liability that arises from a failure of oversight, transforming a legal obligation from a source of anxiety into a manageable process.
⠀
Questions for the management team#
Does our board-level dashboard demonstrate a reasonable and defensible oversight process for our AI risks?
Have we formally defined our risk appetite for automated decision-making, and are we measuring our adherence to it?
Rather than treating AI governance as a compliance cost, how can we report it to our investors as an investment in our brand’s trustworthiness and a driver of sustainable innovation?
Given the personal liability associated with oversight, is the board satisfied with its visibility into the speed and scale of AI adoption across the enterprise, including unmanaged ‘shadow AI’?
Do the leaders responsible for AI have the necessary authority to enforce our governance standards across all business units, or do they serve only as advisors?
⠀
Conclusion#
The role of the technology and risk leader has evolved. It is no longer sufficient to manage technology; the new mandate is to translate technological complexity into the language of strategic risk and corporate governance. This act of translation is now a core competency of modern leadership. The companies that master it will not merely innovate faster. They will build more resilient businesses, earn greater trust from their customers, and equip their directors to govern effectively in an increasingly automated world.
Until next time, build with foresight.
Krzysztof
