Issue #43 — Governance-as-Code ·

Dear Reader,

In Issue #16, published in September 2025, I wrote that AI governance should be expressed as code, not as policy documents. In Issue #32, three months later, I described the AI gateway — a single architectural component that routes all model traffic through one governed plane.

Both issues described a direction. Neither described the full landscape. Since then, the tooling has matured considerably — policy engines, monitoring platforms, testing frameworks, compliance automation — and the EU AI Act high-risk deadline has moved close enough that the question stopped being theoretical.

The question I keep hearing from readers is no longer “should we govern AI technically?” It is: “what tools exist, which ones work, and where do we start?” This issue maps the current state of the technical stack that can enforce governance rules at runtime — not in six months, but with what is available now.

The enforcement gap
#

Deloitte’s 2026 State of AI report puts enterprise governance readiness at 30% — below data management (40%), below technical infrastructure (43%), well below tool access (60%). Tool access is twice governance readiness. Enterprises can reach AI models faster than they can govern what happens when they do.

The reason is structural. Most governance programmes started with documents: principles, policies, ethics statements, committee charters. These are not controls. A written rule that prohibits sending customer data to external models offers no protection if nothing in the infrastructure can stop an analyst doing it in twenty seconds.

Issue #42 found the same pattern across four regulated sectors. In banking, one bank in five had a complete AI system inventory. In telecoms, 21% reported adequate governance for autonomous agents while 47% had already deployed them. The gap is not between intention and awareness. It is between documents and enforcement.

Five layers of technical enforcement
#

What follows is the current state of the technical stack that closes that gap. Not a product recommendation. An honest map of what exists, what works, and what does not.

Layer 1: Policy engines — rules as code, not as PDF.

A policy engine evaluates each AI action against declarative rules at runtime. The request arrives; the engine checks it against the ruleset; the action proceeds or does not.

OPA (Open Policy Agent) with Rego remains the industry standard for policy-as-code. It is production-proven in cloud-native environments but requires engineering skill to adapt for AI-specific use cases. There is no standard AI policy library — every enterprise writes rules from scratch.

The most significant development here is AWS Bedrock AgentCore, which reached general availability in March 2026. It converts natural-language governance rules into Cedar policies automatically. Cedar evaluation is deterministic — no LLM involved at enforcement time — and produces complete audit trails. Security teams write policies; developers build agents; neither modifies the other’s work. This is the first genuinely turnkey policy-as-code solution for AI agent governance.

The gap: in IT security, standardised rule libraries exist — ready-made configurations that organisations adopt and adapt (CIS Benchmarks for server hardening, OWASP rulesets for web applications). For AI governance, nothing comparable has been published. If you want a starter policy library, you are writing it yourself.

Layer 2: AI gateways — traffic control.

I covered this in detail in Issue #32. The gateway is the governed choke point through which all AI traffic flows: prompt filtering, data masking, model routing, cost metering, audit logging.

Since January, the landscape has matured. Each major cloud provider — AWS, Azure, Google Cloud — now offers a gateway product with built-in AI governance controls: prompt filtering, content safety, usage tracking. Independent vendors offer multi-provider gateways for organisations that do not want to lock into a single cloud. Six months ago, the choice was narrower. For enterprises already running API gateways for their web services, extending them to cover AI traffic is the lowest-friction entry point.

I will not repeat the gateway architecture here — read Issue #32 for the full design. The point for this issue is positioning: the gateway is layer 2 of five. Necessary, not sufficient.

Layer 3: Monitoring and drift detection — what changed, and when.

A model that passed all tests at deployment can fail silently in production when the vendor updates training data, when input distributions shift, or when user behaviour changes. Drift detection catches this.

Arize AI provides the strongest post-deployment observability — deep analytics for drift, embedding analysis, quality tracking. Fiddler specialises in explainability and bias detection for regulated industries; if your CISO needs to explain to KNF why a scoring model changed behaviour, Fiddler is purpose-built for that conversation. WhyLabs open-sourced under Apache 2.0 in January 2025 and offers a self-hosted option.

The gap: these tools were built for traditional ML — tabular data, classification, regression. LLM monitoring is bolted on, not native. Detecting meaningful behavioural drift in a language model (the model became more conservative in credit decisions after a vendor update) remains largely unsolved at the automated level. Human evaluation is still required for subtle changes.

Layer 4: Automated testing — continuous, not one-off.

In web application security, automated scanning on every deployment has been standard practice for over a decade. In AI systems, continuous testing is only now becoming technically feasible.

Promptfoo runs inside CI/CD pipelines and maps its 133 plugins to OWASP Top 10 for LLMs, NIST RMF, and MITRE ATLAS. It tests RAG pipelines, multi-turn agents, and policy violations on every code push. Microsoft, Shopify, and Discord use it in production. The setup resembles what most engineering teams already do for web application security — automated scans triggered by deployment, not quarterly audits scheduled by compliance.

Microsoft PyRIT handles custom multi-step adversarial testing and multi-modal evaluation. Garak from NVIDIA tests model resilience against over a hundred risk scenarios — from jailbreak attempts and prompt injection to uncontrolled data disclosure — and is better suited for periodic audits than continuous pipelines.

The gap: current tools test for safety failures — harmful content generation, jailbreaks, data leakage. Almost none test for business logic failures: the model approved a loan it should not have, or generated a report with a numerical error that looked plausible. Safety testing is necessary. Business risk testing is where the real exposure lies, and the tooling is thin.

Layer 5: Circuit breakers and kill switches — runtime intervention.

When a model starts behaving badly at 3am, what stops it?

No commercial product provides a turnkey AI kill switch. Enterprises build this from infrastructure primitives. The emerging pattern uses five mechanisms: a boolean kill flag per agent (checked before every action, sub-millisecond latency via Redis or a feature-flag system), token-bucket rate limiting on expensive operations, pattern detection across sliding time windows to catch repetitive loops, policy-level hard stops via OPA/Rego for semantic conditions (file size limits, regional boundaries, action budgets), and identity revocation via SPIFFE/SPIRE certificates as the nuclear option — when revoked, the agent cannot obtain fresh certificates and all downstream calls are rejected.

The architectural principle gaining consensus: the containment layer belongs in the orchestration layer, not on the application servers. You govern agents from above, not from within.

The gap: the patterns are well-understood but implementation is bespoke. This is an obvious product gap that someone will fill.

Where the stack breaks
#

The honest summary: the building blocks exist. The integration layer does not. No single platform connects policy engine, gateway, monitoring, compliance evidence, and kill switch into a coherent stack. Enterprises that want governance-as-code today build it themselves from four to six different tools.

Credo AI, IBM watsonx.governance, and OneTrust AI Governance provide compliance automation — mapping model characteristics to EU AI Act, ISO 42001, NIST AI RMF, generating audit-ready documentation. Credo AI is deployed by Microsoft, Databricks, and Mastercard. IBM leads both the IDC MarketScape and the Forrester Wave for AI governance platforms. But none of them has a mature workflow for Article 26 deployer obligations or Fundamental Rights Impact Assessments. For most enterprises — which are deployers, not providers — the compliance tooling is underdeveloped with the August 2026 deadline four months away.

Briefing
#

78% of European enterprises unprepared for AI Act obligations

A readiness report from Vision Compliance, spanning eight sectors across Europe, found that 78% of enterprises have not taken meaningful compliance steps. The specific gaps are familiar: 83% lack a formal AI system inventory, 74% have no designated governance body for AI compliance, and 61% cannot produce the technical documentation required for high-risk systems. One finding worth noting: organisations already GDPR-compliant showed measurably better AI Act readiness, particularly in data governance. For Polish enterprises, where GDPR compliance is relatively mature, that correlation is a concrete starting point — the data governance infrastructure already exists; what is missing is the layer that connects it to AI-specific obligations.

EU Commission considers classifying ChatGPT as a “very large platform” under the DSA

Reuters reported that the European Commission is analysing whether OpenAI’s ChatGPT should be designated a “very large online platform” under the Digital Services Act, after its user numbers crossed the regulatory threshold. Designation would subject OpenAI to the DSA’s strictest tier: systemic risk assessments, algorithmic transparency, independent audits. The move is significant because it shows the EU is not waiting for AI Act enforcement alone — it is layering existing regulation onto AI services through whatever framework fits. For any company building AI-powered customer-facing tools in Europe, the relevant question is not just “does the AI Act apply?” but “which of the six or seven overlapping EU regulations applies first?”

US tech layoffs cite AI as top reason — but the attribution is mostly smoke

The Challenger, Gray & Christmas outplacement report for March 2026 lists AI as the single most cited reason for US job cuts: 15,341 announced layoffs, 25% of the monthly total. For Q1 2026, the tech sector cut roughly 80,000 positions, with nearly half attributed to AI and automation. The headline is attention-grabbing. The substance is thinner. “AI” has become a convenient label for restructuring decisions driven by margin pressure, over-hiring corrections, and strategic pivots that have little to do with automation replacing specific roles. The real question for enterprises is not “will AI replace my workforce?” but “are we building the internal capability to use AI productively before the cost pressure forces the decision for us?”

Questions for your leadership team
#

Can your infrastructure prevent an employee from sending customer data to an external AI model right now — not through a policy document, but through a technical control that fires before the data leaves your network?
If a model vendor updated the model behind your credit scoring or fraud detection system tomorrow, would your monitoring detect the behavioural change before it affected decisions? How long would it take?
Which of the five layers described in this issue does your organisation have in production? Which exist only as planned items in a governance roadmap?
If you needed to shut down a specific AI agent at 3am because it started producing harmful outputs, what is the mechanism? Is it documented? Has it been tested?

The integration problem
#

Nordea is the most publicly documented case of governance-as-code in European banking. They scaled from a laptop proof-of-concept to ten thousand users on a production-grade AI platform by embedding governance rules at the platform layer, not per use case. Their description: “organisational rewiring.” It took years.

Most enterprises will not build what Nordea built. They will assemble it from components: a gateway from one vendor, monitoring from another, compliance mapping from a third, kill switches wired together from infrastructure primitives. The skill is not in selecting the tools. It is in connecting them into a system that enforces rules consistently across every AI interaction, every time, without exception.

Issue #16 said governance should be code. Issue #32 showed one component. The full stack exists in pieces. The enterprises that assemble it before August will have a system. The rest will have documents.

Stay balanced,

Krzysztof Goworek

The enforcement gap#

Five layers of technical enforcement#

Where the stack breaks#

Briefing#

Questions for your leadership team#

The integration problem#