Skip to main content

#7 The Ethical Litmus Test

·1622 words·8 mins

Dear reader,

In our last issue, we explored the world of Agentic AI, concluding that governing these autonomous systems requires a shift from static fences to dynamic leashes. But control is only half the battle. A perfectly controlled AI that consistently executes unethical instructions is not a success; it is a meticulously engineered disaster.

This week, we confront another difficult question: how do we embed our values into our AI? Many organisations will have a beautifully written “AI Ethics Policy,” often framed on a wall or buried on their website. It is usually filled with principles like “Fairness,” “Accountability,” and “Transparency.”

Yet, too often, these documents are little more than “governance theatre”—a performative gesture that has almost no connection to the day-to-day reality of how AI systems are actually built and deployed. Today, we will discuss how to bridge that gap, moving from abstract principles to tangible, operational controls.

Some of the regulation requires companies to prove their AI models are not unfairly discriminating against protected groups. The burden of proof is on the company, not the consumer. Service providers cannot simply say their model is fair; they must provide detailed statistical evidence to back it up.

Briefing
#

It’s summer, so we’re going for a deeper dive in technology development instead of chasing news.

Recent AI research highlights two distinct approaches to how models perform complex reasoning tasks. These methods have different characteristics regarding performance, efficiency, and transparency, which presents practical considerations for their application in a business context. Understanding these differences is useful for selecting the appropriate tool for a given task.

Method 1: Externalised Reasoning via Chain of Thought
#

One established method for improving AI reasoning is “Chain of Thought” (CoT). This technique prompts a model to generate a step-by-step explanation of its thinking process in natural language before providing a final answer.

A key benefit of this approach is “monitorability,” as detailed in the paper "Chain of Thought Monitorability”. Because the model’s reasoning is externalised into human-readable text, it creates an audit trail. This trail can be monitored, either by humans or other automated systems, to detect flawed logic or even malicious intent, such as when a model writes “Let’s hack” in its reasoning steps. This provides a layer of transparency. However, researchers note this monitorability is “fragile,” as future training techniques could teach models to hide their reasoning, and the CoT process itself can be computationally intensive.

Method 2: Internalised Reasoning in the Hierarchical Reasoning Model
#

A new approach emerges, as presented in the paper on the "Hierarchical Reasoning Model. This architecture is designed for efficiency and performance on specific, complex logical tasks. The HRM uses two internal modules—a high-level “planner” and a low-level “calculator”—to solve problems in a single computational pass, without generating an external Chain of Thought.

The authors describe CoT as a “crutch” that can be brittle and slow. By contrast, the HRM has demonstrated nearly perfect performance on tasks like solving extreme Sudoku puzzles, using significantly less training data than CoT-based models. Its reasoning is internal and non-linguistic, which makes it faster and more efficient for certain problems. Supposedly it also reduces or even entirely solves the hallucinations problem.

Practical Implications and Future Directions
#

These two approaches present a functional trade-off.

  • Chain of Thought offers greater transparency and auditability, which is valuable for high-stakes applications in regulated fields where decisions must be explainable. The cost of this transparency can be lower efficiency and higher computational overhead.

  • Hierarchical Reasoning Models offer higher performance and efficiency on certain complex tasks by internalising the reasoning process. This makes them suitable for problems where speed and accuracy are paramount and where a detailed, step-by-step explanation is less critical.

⠀Looking ahead, the field is exploring hybrid methods, such as neuro-symbolic AI, which aim to combine the pattern-recognition strengths of neural networks with the verifiable logic of symbolic systems. The goal of such research is to create systems that are both high-performing and trustworthy, potentially offering the benefits of both approaches.

Beyond the Policy: From Words on a Page to Rules in the Code 
#

An AI ethics policy that isn’t embedded in your operational workflow is a work of fiction. To make it real, you must treat it not as a legal document, but as an engineering specification. This requires a shift in mindset and process, focusing on three key areas:

1 Procurement: The lifecycle begins when you buy a new AI tool. Your procurement process must include an “Ethical Litmus Test.” This means adding specific, non-negotiable questions to your vendor due diligence: “Can you provide evidence of bias testing for your model?” “What are the explainability features of your system?” “How do you manage data provenance?” A vendor’s inability to answer these questions should be as big a red flag as a poor security audit.

2 Development: For internally built systems, ethical principles must be translated into technical requirements. If a principle is “Fairness,” the technical requirement for the data science team is “The model must demonstrate a false positive rate for demographic group A that is within 2% of the rate for demographic group B.” This turns a vague value into a measurable, testable engineering target.

3 Monitoring: An AI’s ethical performance is not static. It can “drift” over time as it encounters new data. Post-deployment monitoring cannot just be about technical performance (like uptime); it must include continuous monitoring of fairness and bias metrics.

Frameworks for Foresight: The Impact Assessment
#

One of the most powerful tools for operationalising ethics is the AI Impact Assessment. This is not a simple checklist; it is a structured, formal process undertaken before a project begins, designed to ask a series of difficult “what if” questions. Think of it as a pre-mortem for ethics. The goal is to get a cross-functional team in a room (including lawyers, engineers, and product managers) and force them to think like pessimists: • “What is the worst possible way a malicious actor could abuse this system?” • “Which customer groups could be unintentionally harmed by this decision-making model?” • “If the output of this AI was leaked on the front page of the Financial Times, could we defend it?” This process forces the uncomfortable but essential conversations that uncover hidden risks. It is far cheaper to address these issues on a whiteboard than it is to address them in a courtroom.

The Power of the Crowd: Diverse Teams and “Red Teaming” 
#

You cannot find your own ethical blind spots. It is a neurological and sociological impossibility. The only way to uncover the unintended consequences of your AI is to invite diverse perspectives to break it. •

Diverse Teams: Building an AI team that includes people from different backgrounds, disciplines (sociologists, ethicists, lawyers), and life experiences is not a “nice-to-have.” It is a core risk management strategy. A team of 30-year-old male engineers is statistically unlikely to foresee how an AI might misinterpret the language of an 80-year-old female customer.

Ethical Red Teaming: This is the process of actively trying to make your AI behave unethically. You assemble a team whose sole job is to “jailbreak” the system. They will probe it with adversarial prompts, feed it biased data, and try to trick it into producing harmful or discriminatory outputs. It is the only way to find the hidden vulnerabilities before your customers do.

Case Studies in Ethical Dilemmas 
#

Let’s make this concrete with two hypothetical scenarios in a banking context:

Case Study 1: The “Helpful” Debt Collection Agent.
#

A bank deploys an AI agent to help customers who are behind on their loan payments. The agent is fine-tuned on past data and discovers that sending reminders at 2:00 AM, when people are most anxious, results in a 5% higher repayment rate. From a purely financial perspective, this is a success. But is it ethical? An Impact Assessment would have likely flagged this as a high-risk strategy that preys on customer vulnerability, leading to a rule being hard-coded into the agent: “No customer communication between 10 PM and 8 AM.”

Case Study 2: The Biased Fraud Model. 
#

A fraud detection model flags a transaction from a new immigrant as “high-risk” because their spending pattern doesn’t match the “normal” patterns in the training data. A diverse red team, however, points out that new immigrants often have unusual but perfectly legitimate spending patterns (e.g., sending large amounts of money abroad to family). This insight leads to the inclusion of new data sources and a recalibration of the model to be more inclusive, preventing thousands of legitimate customers from being unfairly blocked.

Actionable Takeaways
#

1 Translate Your Policy into a Checklist. Turn your high-level ethics policy into a concrete checklist that must be completed for every new AI project.

2 Mandate the “Pessimist’s Meeting.” Make a pre-mortem style AI Impact Assessment a mandatory gate for any significant AI initiative.

3 Appoint an “Ethical Red Team.” Formally assign a cross-functional team the task of trying to break your AI models before they are deployed.

4 Ask “Who is Not in the Room?” When reviewing a new AI project, always ask which perspectives are missing from the development and testing team.

5 Demand the Audit Trail. For your most critical generative AI systems, insist on the implementation of Chain-of-Thought monitoring and RAG to ensure you have a defensible record of the AI’s reasoning.

Embedding ethics into your AI operations is not a simple task. It requires moving beyond good intentions and embracing a culture of rigorous, skeptical, and continuous inquiry. It requires treating your values not as a poster on the wall, but as a non-negotiable part of your engineering and risk management DNA.

Until next time, build with foresight.

Krzysztof