Skip to main content

Issue #32 — The AI Gateway Blueprint

·2537 words·12 mins

Dear Reader,

In earlier issues we talked about AI systems and explainability in the eyes of regulators. This time we go one layer lower: away from principles and into plumbing.

Many enterprises now have AI policies, ethics statements, and risk committees. On paper, governance looks serious. In production, a large share of AI use still happens through unsanctioned tools, personal accounts, and improvised prompts.

This issue is about that gap. More precisely: about the one piece of architecture that can close it.

Most organisations started from policy. They set up ethics boards, published principles and bolted AI paragraphs onto existing risk frameworks. As a first move, that was understandable. It does not address today’s failure modes though.

The systems your governance model grew up with are deterministic. They change slowly, through managed releases. Once deployed, they do what the code says. Reviews, change boards and sign‑off checklists work tolerably well in that world.

Generative models behave differently:

  • They are stochastic: the same input can reasonably produce different outputs. Hallucination is not an odd corner case; it is how these systems work.
  • They drift: vendors alter training data, architectures and safety layers. Behaviour can move in ways that matter for risk without a line of your own code changing.
  • They live in a consumer ecosystem optimised for immediacy. Anyone with a browser and a personal email address can reach a frontier‑grade model. If you treat this as a documentation problem, you will keep losing. The only meaningful test of a governance rule here is simple:

Can we express it as code that automatically enforces the rule at the right point in the system?

If “no PII may leave the country” never shows up as a deterministic check on outbound traffic, you do not have a control, you have a preference.

That shift is uncomfortable for functions whose tools have always been documents and committees. Policies and training still matter. They simply sit on top of a technical substrate. Without that substrate, you are asking people to compensate manually for stochastic behaviour, vendor drift and Shadow AI at scale. That is not a realistic plan.

From governance theatre to real control
#

Legal and compliance are not idle. They publish AI policies, acceptable use rules, ethics statements and committee charters. The documents are usually solid and often go to the board.

Reality does not match the paperwork.

A large share of employees now use unsanctioned AI tools at work. Many admit to pasting internal data into public chatbots, sometimes from personal accounts that sit outside your perimeter. The organisation “bans” these tools. The browser does not.

This is governance theatre. You have visible artefacts that signal control, but they create no friction in the runtime environment. A written ban on sending customer data to external models offers no protection if nothing in your stack can actually stop a developer or analyst doing it in twenty seconds.

Consumer‑grade AI makes the problem worse. Tools like ChatGPT were designed for usefulness and speed, not compliance. They feel fast, forgiving and powerful. Internal tools often feel slower and more constrained because they have to scan for PII, route to approved models and respect data residency. Users compare the governed tool to the ungoverned one and quietly pick the latter.

From a regulator’s perspective, this is not an abstract gap between “ethics” and “behaviour”. It is a control failure. After the next incident, they will not ask how thick your AI policy binder was. They will ask what technical control was in place at the moment the data left your network or the model acted.

What an AI gateway actually is
#

The architectural response that is emerging is an AI gateway: a specialised control plane between all your applications and all your model providers.

Without it, you end up with a many‑to‑many graph. Each chatbot, assistant or agent talks directly to one or more model APIs. Every team invents its own way of handling authentication, logging, data handling and cost tracking. When a provider deprecates a model or changes pricing, you touch dozens of codebases. When a regulator asks what a particular system did on a particular day, you discover that logs are inconsistent or incomplete.

The gateway turns this into a hub‑and‑spoke pattern. All AI traffic – from chat interfaces, back‑office tools, coding assistants, embedded agents – flows through one well‑defined plane. Applications call a single internal API. The gateway takes care of model selection, policy checks, data processing, logging and metering before anything leaves your perimeter.

Three roles matter most.

  • Sovereignty. Developers integrate with the gateway, not with a specific vendor. If a provider has an outage, changes behaviour or raises prices, you adjust routing rules in one place. Downstream systems keep running. When you later introduce in‑house or regional models, you can hide them behind the same interface.
  • Data firewall. The gateway intercepts prompts, detects sensitive entities using NER and pattern matching, and replaces them with tokens. “Anna Kowalska, account 12345” becomes “<PERSON_1>, <ACCT_1>”. The model only ever sees placeholders. It generates a response that uses them, and the gateway re‑hydrates the answer with the original values on the way back. The model provider never sees the raw PII.
  • Audit trail. Because all calls pass one point, you can implement consistent, tamper‑evident logging once. Every request and response can be recorded with timestamps, user IDs, model versions and decisions. When you need to trace how a particular AI‑supported decision was produced, you are not scraping logs from half a dozen systems. For more complex estates, this gateway often sits alongside a tool‑governing layer and a traditional API gateway for backend services. You do not need that full pattern from day one. The important step is to stop AI traffic leaking through uncontrolled paths and bring it through a single, governed choke point.

From policy text to policy code
#

Once traffic flows through one place, you can stop treating policies as essays and start treating them as executable rules.

A dedicated policy engine evaluates each incoming request against a set of declarative rules. The gateway asks: “Given this user, this model, this context, is this call allowed? Under what conditions?” The enforcement point then acts on the answer.

Rules that currently live in PDFs move into code. For example:

  • junior analysts may not spend more than £X per day on premium models;
  • prompts containing customer identifiers must not be sent to public SaaS models;
  • certain classes of decision must route to a human queue when confidence is low or specific flags are present. Because the rules are code, you can put them under version control, review them, test them and roll them out like any other critical configuration. Breaches lead to an automatic block or failure, not a note in minutes for a committee to discuss weeks later.

This is also where abstract regulatory language becomes concrete. Frameworks such as the NIST AI Risk Management Framework and standards like ISO 42001 expect you to identify, measure and manage AI risk through the lifecycle. A policy engine plus gateway is where those verbs stop being presentation slides and start being actual behaviour.

You will not capture board‑level risk appetite or culture purely in code. Some decisions will always need judgement. The important point is direction: if a rule never appears in code, it is unlikely to be applied consistently in a world of probabilistic systems and Shadow AI.

Regulation as an engineering brief
#

AI regulation is often treated as a legal concern sitting in a different universe from architecture. Read closely, the technical articles sound more like a design document for your control plane.

The EU AI Act is a good example. For high‑risk systems it expects at least three things:

  • Traceability. Systems must log their operation. A central gateway is the natural place to emit consistent, structured logs for all model calls, regardless of application or vendor.
  • Human oversight. Operators must be able to intervene. At gateway level, this becomes circuit breakers and routing rules: you can pause a use case when error rates spike, or send certain classes of requests to a human queue instead of letting the model respond unattended.
  • Robustness and security. Systems should cope with attacks such as prompt injection and continue to operate safely. The gateway is where you can run requests through dedicated scanners, rate‑limit traffic to avoid “denial of wallet” patterns, and fail over to backup models when a provider degrades. Other frameworks point in the same direction. ISO 42001 talks about having an AI management system with evidence of control. That evidence is much easier to provide when you can say, truthfully, “every call to any AI model passed through this governed plane, under these policies, with these logs”.

The useful move for the C‑suite is to stop treating regulation as an after‑the‑fact brake and start treating it as an engineering brief. It tells you which capabilities must exist in your architecture. The gateway is where many of them belong.

The economics: watching the meter
#

Even if you ignore regulation, the economics of AI argue for centralisation.

Generative models turn software from a largely fixed cost into a variable one. Usage is metered in tokens. Adoption can jump in weeks. A single misconfigured batch job can burn through a month’s budget. If each team talks to models directly, nobody has a full picture until the invoice lands.

A gateway gives you a single meter. You can see, in close to real time, which teams and which use cases are driving spend. You can set hard limits by user, department or application. You can spot anomalies early enough to act.

You can also stop treating one model as the default for everything. Many high‑volume tasks are simple: routing, short summaries, basic classification. A smaller model is enough. The gateway can route by rule or via a lightweight router model:

  • simple tasks go to cheaper models;
  • complex, higher‑risk tasks go to models that justify their cost. Over a large estate, that blended optimisation matters. The same platform can also do semantic caching: when two prompts are close enough in meaning, it can serve the previous answer rather than calling the model again. In repetitive workloads, that reduces both latency and spend.

This is how you already treat other utilities. You do not let every team lay its own fibre or negotiate its own data‑centre contract. You centralise, meter and optimise. AI is heading the same way.

Security when AI can act
#

So far, we have been talking mainly about AI that writes text. The risk profile changes when AI can act.

Agentic systems can trigger payments, change configurations, update records, send messages. At that point, a hallucination is not an odd paragraph in a draft; it is a mistaken transfer, a wrong setting in production, or a misleading report sent to a supervisor.

Security work around large language models has already catalogued prompt injection, insecure handling of outputs, sensitive data leakage and deliberate attempts to exhaust capacity. The details evolve, but the pattern is clear: relying only on a model’s internal safety fine‑tuning is not enough. Those mechanisms are opaque, and they change outside your control.

A gateway gives you another line of defence. You can:

  • run prompts and responses through dedicated guard models that look for known attack patterns or disallowed content before they reach the agent or the user;
  • restrict which tools or APIs any given agent is allowed to call, with the gateway as the enforcer of those permissions;
  • monitor and throttle behaviour across all agents, not just within a single application. None of this eliminates risk. It does bring AI‑driven actions under the same kind of perimeter thinking you already apply to payments or core banking: never exposed directly to the public internet, always fronted by gateways and monitoring.

How to land a gateway without a revolt
#

Landing this in a live organisation is as much social as technical. The most practical playbook I know is the Shadow AI: Amnesty & Pave Protocol from Issue #24.

In short:

  • you start by making Shadow AI visible and non‑punitive;
  • you then build a credible, governed “paved road” that people actually want to use;
  • only after that do you tighten the perimeter and route traffic through that road by default. Rather than repeat the full protocol here, I would suggest reading Issue #24 side by side with this one. Together, they give you both the control‑plane architecture and the adoption strategy.

The Briefing
#

1. AI Omnibus: Implementation, not theory

The Commission’s AI Omnibus proposal quietly changes how the AI Act is framed: it treats it mainly as a problem of putting rules into practice[1]. It would pause (“stop the clock” on) some high‑risk AI duties until the right standards and support tools are available, loosen some registration requirements, and make it easier to use sensitive data to find and fix bias in AI systems. The EDPB and EDPS object to this: they want to keep a strict “only when really necessary” test for using special‑category data, prevent key Annex III systems from slipping out of registration, and ensure data protection authorities are formally involved in EU‑level AI sandboxes[2]. In effect, the message from Brussels is that technical gateways, registries and machine‑readable policies are now seen as the default infrastructure for AI compliance.

2. Agentic AI: adoption outpacing governance

Campbell Robertson’s “Agentic AI Governance Gap” notes that roughly 90% of large organisations are deploying AI agents, but only 19% have implemented mature governance frameworks.[3] PEX Network’s review of agentic AI pilots adds that about 65% of enterprises are piloting agents, but only around 11% reach production, with Gartner predicting over 40% of agentic projects will be cancelled by 2027.[4]

3. Deloitte: strategy ready, operations not

Deloitte’s State of AI in the Enterprise 2026 reports a 50% rise in worker access to AI and a coming doubling of firms with ≥40% of projects in production – but only about a third are truly re‑engineering core processes. Most feel strategically prepared for AI, yet operationally underprepared on infrastructure, data, risk and talent.[5]

Taken together: models are not the constraint. Operating models, gateways and governance‑as‑code are.

A question for this week
#

Before you sign off the next AI budget, ask yourself one question:

If someone tomorrow pastes a sensitive dataset into an external AI tool, what technical control stops that data leaving intact?

If the honest answer is, “our acceptable use policy says they should not”, you are still in governance theatre.

If the answer is, “all AI traffic is forced through our gateway, which strips or blocks sensitive content and records the call”, you are starting to have governance.

The same question applies to cost and regulation. If your view of AI spend is a monthly invoice from one vendor, you are flying blind. If a supervisor or regulator asked you to reconstruct how a particular AI‑assisted decision was made, could you do it?

AI governance in 2026 is no longer mainly about writing better policies or running more workshops. It is an architectural choice. You can make that choice now, while you still have room to manoeuvre. Or you can wait until an incident makes it for you.