If you thought Generative AI was trouble when it could only churn out text, just wait until it starts seeing and hearing. Multimodal AI—systems that blend text, images, audio, and video—have marched into the enterprise, not with a polite knock but with the subtlety of a marching band at midnight. For business leaders, the challenge is no longer spotting the hype, but managing a new category of risks that are difficult to define, measure, and control.
The Collapse of “Seeing is Believing”#
The past year has seen deepfakes graduate from internet mischief to boardroom menace. In a 2024 incident, a finance worker at a Hong Kong multinational was duped into wiring $25 million after a video call with what he thought were his colleagues and CFO. In reality, every face and voice belonged to an AI-generated impostor—a “deepfake whaling” attack, now available as a service to any criminal with a credit card and low morals.
What makes this particularly alarming is the democratisation of these tools. Where once creating a convincing deepfake required significant technical expertise and computing power, today’s services offer turnkey solutions. The barrier to entry has collapsed, and with it, your traditional defences. The finance worker in Hong Kong wasn’t naive or untrained—he was operating in an environment where the fundamental assumption of “seeing is believing” had been quietly undermined.
The lesson is stark: you can no longer trust your eyes or ears in the digital workplace. Video calls, audio confirmations, and even recorded evidence now require verification protocols that would have seemed paranoid just two years ago.
The Briefing#
Regulatory Shifts: EU AI Act Implementation Accelerates#
The European Commission has reframed the AI Act rollout, pairing a €3 billion innovation drive with new compliance support. The launch of the “Apply AI Strategy” and “AI in Science Strategy” signals a shift from pure regulation to an integrated compliance-and-growth agenda. The new AI Act Service Desk and Single Information Platform now offer a “Compliance Checker” and “AI Act Explorer,” designed to help businesses interpret and meet their obligations as the Act phases in through 2027. This makes regulatory engagement a lever for competitive advantage, not just a compliance burden. Boards should immediately mobilise teams to make use of these tools and ensure that incident response and reporting protocols align with evolving requirements, as the Commission has also published a new template for reporting serious incidents involving general-purpose AI models.
Is OpenAI “Too Big to Fail”?#
A new wave of analysis warns that OpenAI’s rapid expansion—across products, partnerships, and capital commitments—has rendered new systemic risks. The company’s $12 billion quarterly loss, deep entanglements with tech giants and government, and eroding enterprise market share (now overtaken by Anthropic in key B2B segments) have led some analysts to argue that OpenAI is deliberately becoming “too big to fail.” The implication is clear: the collapse of such a provider could trigger sector-wide instability, much as the failure of major financial institutions did in 2008. For the C-suite, this means that vendor concentration is now a systemic risk, not just an operational one. Diversification and robust contingency planning are essential, as is ongoing due diligence on the financial and governance health of major AI partners.
High-Profile Governance Failures: The Deloitte Case#
Deloitte Australia’s refund of part of a AU$440,000 government contract—after AI-generated fabrications were detected in a major report—highlights the dangers of scaling AI without mature governance. The failure to detect fictitious citations, coupled with the lack of disclosure around AI use, is symptomatic of a wider industry challenge: rapid AI adoption is outpacing the evolution of internal controls. Senior leaders must treat AI-generated outputs with the same scrutiny as traditional work products and ensure that all use of AI in regulated deliverables is fully disclosed and auditable.
Market Volatility: Scepticism and Short Interest in AI Leaders#
Recent market activity underscores the volatility facing even the most prominent AI firms. Palantir and Nvidia, both leaders in AI infrastructure and software, have come under short-selling pressure from high-profile investors such as Michael Burry. Despite strong earnings, share prices have dipped amid concerns about lofty valuations and the sustainability of AI-driven growth. For enterprise buyers, this is a reminder that sector leadership can shift rapidly, and that financial health and market confidence are as important as technical capability when evaluating long-term partners.
The Compliance Labyrinth#
With AI systems hoovering up video and audio data, the tangled web of GDPR and the EU AI Act becomes even harder to navigate. Algorithmic opacity—the fact that nobody, not even the engineers, can always explain what the AI is really doing—makes it a nightmare to prove your business is handling personal data lawfully. And when regulators come, they expect not only clean hands, but a full forensic trail of every AI decision, especially if it touches anything even vaguely human.
Consider the implications for your HR processes. If your recruitment AI is scanning video interviews to assess candidates, can you explain to a regulator exactly which micro-expressions or vocal patterns led to a rejection? Can you prove the system isn’t inadvertently discriminating based on accent, ethnicity, or disability? The burden of proof has shifted entirely to you, and “the algorithm said so” is not a defence that will stand up in court or in the court of public opinion.
The authenticity of digital communications is now in question. Deloitte’s 2024 Connected Consumer Study found that 68% of those familiar with generative AI worry about being deceived by synthetic content, while more than half admit they struggle to tell the difference between real and AI-generated media. The upshot: businesses must invest in detection and provenance tools, but even the best are fallible. The onus is now on you to prove your evidence is genuine, not a digital forgery.
This erosion of trust extends beyond external fraud. Internal communications face the same crisis of authenticity. How do you know that the audio recording from a disciplinary hearing hasn’t been tampered with? How do you verify that the video evidence from a workplace incident is genuine? The answer is that you’re going to need robust chain-of-custody protocols, digital signatures, and tamper-evident logging—or you’ll find yourself defending the indefensible when disputes escalate to tribunals or litigation.
When AI Enters the Physical World#
As AI leaps off the screen and into the physical world—controlling robots, vehicles, or factory doors—the risks become tangible. Computer vision and sensor data are now the backbone of quality control in manufacturing, with AI-powered cameras spotting defects invisible to humans and slashing waste and downtime. Yet, a misfiring model can halt a production line or, worse, put safety on the line.
The physical dimension introduces a whole new category of liability. When an AI makes a bad call in a text generation task, you might lose a customer or suffer embarrassment. When an AI makes a bad call in a physical system, people can get hurt, and your insurance premiums will reflect that reality. The legal and regulatory frameworks are still catching up, but early case law suggests that “we trusted the AI” will be about as convincing a defence as “the dog ate my homework.”
Governance frameworks now demand airtight data security, auditability, and, for high-stakes decisions, that all-important human in the loop. These aren’t theoretical requirements—they’re the minimum standard for any organisation that wants to avoid becoming a cautionary tale.
Framework: Multimodal AI Applications and Control Requirements#
The deployment model for multimodal AI must match the risk profile of the application. Here is a practical taxonomy for calibrating your oversight requirements.
| Application Domain | Multimodal AI Capability | Primary Value | Primary Risk | Required Control Level |
|---|---|---|---|---|
| Manufacturing Quality Control | Computer vision detecting micro-defects in components at 99% accuracy. | Reduced waste, faster throughput, consistent quality standards. | Production halt from false positives; safety incidents from false negatives. | Human-on-the-Loop (HOTL) with immediate escalation protocols for anomalies. |
| Financial Approvals | Voice/video authentication for high-value transaction authorisation. | Convenience, speed of approval process. | Deepfake fraud leading to unauthorised fund transfers. | Human-in-Command (HIC) with multi-factor, out-of-band verification for amounts above threshold. |
| Security Surveillance | Real-time video analysis to detect suspicious behaviour or unattended packages. | Reduced false alarms, faster incident response, optimised security staff deployment. | Privacy violations, bias in threat detection, over-reliance on automated alerts. | Human-in-the-Loop (HITL) for any action beyond alert generation; regular bias audits required. |
| Customer Service (Voice AI) | Real-time speech recognition, sentiment analysis, and agent coaching. | Improved first-call resolution, compliance monitoring, agent performance. | Misinterpretation of customer intent, privacy concerns from continuous monitoring. | Human-on-the-Loop (HOTL) with agent override capability; explicit customer consent for recording. |
| Access Control Systems | Facial recognition or voice authentication to grant physical or system access. | Enhanced security, reduced credential sharing, audit trail of access events. | False rejections (operational disruption), false acceptances (security breach), bias against certain demographics. | Human-in-the-Loop (HITL) for high-security zones; fallback authentication methods mandatory. |
Real-World Applications: Where Multimodal AI Delivers Value#
Multimodal AI isn’t just a theoretical headache—it’s already reshaping how enterprises operate, and the early results are impressive where proper controls are in place.
Manufacturing: The 99% Standard
AI-driven visual inspection systems now scan every widget and whirring part, catching micro-cracks or assembly errors long before they become costly recalls. These tools have pushed defect detection rates to the dizzying heights of 99%, and if that doesn’t make your quality manager smile, nothing will.
The impact on throughput and cost is substantial. Traditional manual inspection is not only slower but inconsistent—human attention wanders, fatigue sets in, and subtle defects slip through. AI systems don’t have bad days, don’t need coffee breaks, and can maintain microscopic attention to detail across millions of units. Early adopters in automotive and electronics manufacturing report defect escape rates dropping by an order of magnitude, with corresponding reductions in warranty claims and brand damage.
Security: From Noise to Signal
AI-powered surveillance can distinguish between a customer browsing and a would-be thief, or flag an unattended package before it becomes a security incident. The system learns what “normal” looks like in each zone and can alert human operators only when genuine anomalies occur, cutting through the noise that plagues traditional CCTV monitoring.
The retail applications extend beyond loss prevention. Some chains are exploring multimodal AI to analyse aggregate customer movement patterns—where bottlenecks form at peak times, which displays attract attention. However, this is regulatory quicksand. Under GDPR, any system that identifies individuals requires explicit consent and a lawful basis for processing. The EU AI Act adds further constraints: biometric identification systems in publicly accessible spaces face strict prohibitions, and any AI-driven profiling that affects individuals could be classified as high-risk, triggering onerous compliance requirements. The operational reality is that most retailers are restricting these systems to anonymised, aggregate analytics rather than individual tracking, precisely because the legal and reputational risks outweigh the marginal gains.
Customer Service: The Real-Time Coach
Voice AI listens in on call centre exchanges, transcribes conversations, gauges sentiment, and even nudges agents with real-time coaching. By marrying audio with customer history and text, these systems offer a panoramic view of each interaction, boosting both compliance and customer satisfaction. Agents receive on-screen prompts suggesting relevant product information, empathy cues when a customer is frustrated, or compliance warnings when conversations drift into risky territory.
The coaching dimension is perhaps the most transformative. Instead of quarterly reviews based on a handful of cherry-picked calls, agents now receive continuous feedback on tone, pace, word choice, and outcomes. High performers can be studied and their patterns systematised. Struggling agents can be supported with targeted training. The result is a measurable lift in first-call resolution rates and customer satisfaction scores, with the added bonus of reducing the stress and guesswork that makes call centre work so draining.
Questions for Your Leadership Team#
On Risk Mapping: Have we identified every point in our operations where a convincing deepfake (voice, video, or document) could cause material harm? What controls exist at each point?
On Authentication: For high-value or high-risk transactions, do we still rely solely on voice or video confirmation? What multi-factor, out-of-band verification protocols have we implemented?
On Provenance: Can we prove the authenticity of our digital evidence? Do we have chain-of-custody protocols, cryptographic signatures, and tamper-evident logging for critical communications?
On Physical Systems: Where does AI control or influence physical processes (manufacturing, access control, logistics)? What is our human oversight model for each application, and can we justify it?
On Compliance: For systems processing video or audio, have we documented why less intrusive alternatives (aggregated analytics, text-only data) are insufficient? Can we demonstrate that we’re collecting only the minimum personal data necessary?
⠀
Conclusion#
Multimodal AI is not just the next chapter in enterprise technology—it’s a whole new genre, with plot twists aplenty. The leaders who thrive will be those who treat governance not as a compliance chore, but as a strategic shield and a source of competitive advantage. In this new world, seeing (and hearing) is no longer believing. But with the right frameworks, the right controls, and a healthy dose of scepticism, you can keep your business both safe and sharp.
The winners in this transition will be those who move decisively to capture the productivity gains while building robust defences against the new risks. The losers will be those who either freeze in fear, missing the upside, or charge ahead recklessly, assuming their existing controls will suffice. Neither extreme achieves equilibrium. The path forward requires clear-eyed assessment of both the capabilities and the vulnerabilities that multimodal AI introduces, paired with the leadership discipline to design systems that exploit the former while defending against the latter.
Until next time, build with foresight.
Krzysztof
