Skip to main content

#5 AI has the keys

·1559 words·8 mins

In our last issue, we established that building enterprise AI on a poor data foundation is like constructing a skyscraper on a swamp. It is an exercise in futility, destined to create an impressive-looking but dangerously hollow structure.

This week, we take the next logical step. Assume for a moment that your data house is in order. The foundation is solid. Now, what happens when you hand the keys to that house over to an AI that doesn’t just analyse and report, but can act on its own? What happens when the AI intern is given the authority to not just write the report, but to execute its recommendations?

This is the world of Agentic AI. It represents a milestone in AI’s capabilities, but it also marks a new frontier of risk. The question is no longer just “Is the AI’s analysis correct?” but “Can we control what the AI does next?”

The Briefing
#

This month, the theoretical risks of AI governance became more tangible. While your legal team grapples with the EU AI Act, your marketing department might be experimenting with an autonomous AI agent that quietly violates it. This is the spectre of ‘Shadow AI’, now supercharged and available to everyone.

OpenAI’s launch of ChatGPT Agent marks the moment this problem stopped being a back-office IT concern and started potentially affecting most employees. This is not just a smarter chatbot; it’s a virtual employee that can read your calendar, create slide decks, and connect to your corporate apps. The peril is that every employee with a subscription now has a powerful, autonomous assistant capable of making decisions and interacting with customers with minimal oversight. Of course, this isn’t the first publicly available AI agent, but ChatGPT is synonymous with generative AI for the mass audience, so it’s safe to assume that the arrival of an agent from OpenAI will build the market.

Any executive hoping for a last-minute reprieve from the EU AI Act had their hopes dashed. Despite a significant lobbying effort from over 100 of Europe’s biggest firms calling for a two-year pause, the European Commission’s response was an unequivocal “no”. The deadlines stand. This isn’t just policy; it’s a statement of intent. The EU sees regulatory certainty, not flexibility, as the key to trust and adoption.

Yet, just as the risks become tangible, so do the solutions. In a counter-narrative to the usual doom-mongering, Google announced that its AI agent, ‘Big Sleep’, proactively discovered a critical software vulnerability~that hackers were preparing to exploit. This is a case of AI acting as a defensive shield. True governance, then, is not about writing policies to stop the bad; it’s about building the systems that put those policies into practice.

Defining the Agent: From Reactive Tool to Autonomous Actor
#

Let’s be clear about what we mean by “Agentic AI.” The term is awash with hype, so a simple definition is in order.

Generative AI, available via tools like ChatGPT, is a reactive tool. You give it a prompt, and it gives you a response. An AI Agent, by contrast, is an actor. You give it a goal, and it can independently devise and execute a sequence of steps to achieve it.

  • A standard AI can write you a travel itinerary.

  • An AI Agent can be told, “Book me the most efficient business trip to Frankfurt next Tuesday,” and it will then proceed to browse airline websites, compare prices, access your calendar, book the flight with your saved details, and add the itinerary to your calendar.

⠀This ability to take action in the digital world is the key difference. It’s a shift from a system that provides information to one that exercises real power. And for any manager, especially in a regulated industry, power without control is the definition of a nightmare.

The Fine-Tuning Paradox: Creating a More Capable, More Dangerous Agent
#

The temptation for every enterprise is to make these agents smarter by fine-tuning them on internal company data. If you fine-tune an agent on your entire library of sales call transcripts, it will become exceptionally good at understanding your customers’ objections.

This, however, creates a dangerous paradox. By making the agent more capable, you also make it a more perfect reflection of your organisation’s hidden biases. If your historical sales data shows your team neglected female-led businesses, an autonomous agent trained on that data will not magically correct this. It will pursue its goal of “increasing sales” by executing the patterns it learned, systematically ignoring an entire market segment with terrifying efficiency.

Fine-tuning gives the agent your company’s internal, unwritten knowledge. But it also gives it your company’s blind spots. The governance challenge, then, is not just about controlling a generic tool, but about controlling a tool you have personally, if unintentionally, armed with your own worst habits.

The Industry’s Misguided Focus
#

This leads to a rather worrying observation: the Agentic AI industry is, for the most part, focused on the wrong things. The vast majority of research and funding is focused on increasing agent capabilities. Can it perform more complex tasks? Can it operate for longer without human intervention?

These are interesting engineering questions. But for an enterprise leader, they are secondary. A recent, brilliant essay by Toby Ord of Oxford’s AI Governance Initiative introduces the concept of a “half-life for the success rates of AI agents.”~He suggests that, much like radioactive isotopes, the probability of an agent successfully completing a task decays exponentially with each additional step. If an agent has a 99% chance of completing one step correctly, its chance of completing a 100-step task without error is only 37%.

This creates a dangerous economic paradox. The very complexity that makes an agent seem powerful also makes it exponentially more likely to fail, requiring constant, expensive human supervision to verify its work. The cost of this oversight can quickly surpass the savings from automation, yet the executive enthusiasm for “cheap automation” is so great that many are walking headfirst into this trap. The industry is obsessed with building a faster car. We, as leaders in regulated industries, need to be obsessed with building better brakes. On a more philosophical level, we should be asking ourselves: “will the human role in knowledge work be reduced to verifying that the AI has not made a mistake?”. Who will want to perform such a role?

Agentic AI Governance: From Static Fences to Dynamic Leashes
#

Governing an autonomous agent requires a fundamental shift in our approach. Traditional AI governance is often like building a fence: you perform a risk assessment, set your policies, and deploy the model inside those static constraints.

Agentic AI governance is more like walking a very large, very strong, and unpredictable dog. A fence is useless. You need a dynamic leash, a constant connection, and the ability to pull back forcefully at a moment’s notice. This new model of governance relies on three key principles:

1 Continuous Assurance: The idea of a one-time, pre-deployment audit is obsolete. Governance must be a continuous, automated process. This is where the automated red teaming we discussed in the last issue becomes essential.

2 Dynamic Controls & “Tripwires”: You need to embed automated “tripwires” into the agent’s operating environment. For example, an agent designed to manage procurement might have a hard-coded rule: “If any single proposed transaction exceeds €50,000, halt all action immediately and request human approval.”

3 Auditable Reasoning: As I wrote in Issue #3, forcing an agent to use Chain-of-Thought (CoT) and provide citations via Retrieval-Augmented Generation (RAG) is paramount. For an agent, the audit trail of its reasoning is even more important than the outcome of its actions.

⠀The EU AI Act is not yet fully equipped for this dynamism, but its risk-based framework provides a clear signal. An AI agent that can act on a company’s behalf in areas like HR or finance would almost certainly be classified as “high-risk,” automatically subjecting it to the Act’s most stringent requirements for human oversight.

Questions Worth Asking
#

1 What is the “Blast Radius”? For any proposed AI agent, have we clearly defined and limited its potential “blast radius”? What systems can it access? What is the absolute worst-case scenario if it malfunctions?

2 Where is the “Off-Switch”? Do we have a reliable, immediate, and human-accessible “off-switch” for every agent we deploy? Who has the authority to use it?

3 How Do We Define “Success”? Is the agent’s goal defined purely by efficiency (e.g., “reduce costs”), or have we embedded “guardrail metrics” related to safety, compliance, and customer satisfaction?

4 Are We Training for Competence or Compliance? When we fine-tune an agent on our data, are we only teaching it to be good at its job, or are we also explicitly teaching it the rules it must not break?

Conclusion
#

The arrival of Agentic AI is not a distant prospect; it is happening now. It promises a future of unprecedented automation, but it also presents a governance challenge of a completely new magnitude.

Building the control systems for these agents—the dynamic leashes, the tripwires, the off-switches—is the most critical engineering and management task in this space for the years to come. It is not about stifling innovation. It is about creating the conditions of safety and trust that will allow true, sustainable innovation to flourish.

All the best,

Krzysztof