Issue #38 — AI in Pharma: Navigating the GxP Minefield ·

Dear Reader,

Thread K: Industry Verticals — AI Production in Regulated Sectors (Issues #36-40)

In 2019, Insilico Medicine’s AI platform identified TNIK — a kinase linked to lung tissue scarring — as a potential target for idiopathic pulmonary fibrosis, a disease with no curative treatment that kills most patients within five years of diagnosis. Within 18 months, a second AI system had designed a molecule capable of inhibiting that target. By 2024, the drug had completed Phase 2a clinical trials in 71 patients across 21 sites: the highest dose group showed average lung function improvement of 98.4 mL against a placebo group that declined by 62.3 mL. The full pipeline, from target identification to Phase 2a readout, took approximately five years. The industry average is closer to twelve, at a cost of over two billion dollars. The work was peer-reviewed in Nature Biotechnology.

This is what AI is doing in pharmaceutical research right now. The regulatory complexity is real — we will get to it — but the story of AI in pharma is not primarily a compliance problem. It is a capability story that is moving faster than most enterprise readers outside the sector have registered.

What AI is changing in drug development
#

Protein structure prediction. For fifty years, determining the three-dimensional structure of a protein from its amino acid sequence was one of biology’s hardest problems. In 2020, DeepMind’s AlphaFold solved it. The AlphaFold database now holds predictions for more than 200 million protein structures — covering nearly every known protein in biology — and has been used by over three million researchers across 190 countries. Its creators won the Nobel Prize in Chemistry in 2024. The practical consequence for drug development: targets that previously required years of experimental structural work can now be characterised in hours. No AlphaFold-derived drug has yet completed clinical trials, but the enabling infrastructure is now in place and Isomorphic Labs, the drug design company built on top of it, is preparing its first oncology candidates for first-in-human testing.

Drug repurposing at speed. On 4 February 2020, weeks into the COVID-19 pandemic, BenevolentAI published a hypothesis in The Lancet: baricitinib, an existing drug approved for rheumatoid arthritis, had properties that might block viral entry into lung cells. The hypothesis was AI-generated. Nine months later, the FDA granted Emergency Use Authorisation for baricitinib in hospitalised COVID-19 patients. A subsequent Phase 3 trial in 1,525 patients found a 38% reduction in mortality — a secondary endpoint, but one substantial enough to reshape treatment protocols globally. BenevolentAI did not run the trials. It identified the candidate. The time from AI hypothesis to clinical EUA was under a year.

Diagnostics. In September 2021, Paige Prostate became the first AI-based pathology software to receive FDA De Novo authorisation — the regulatory pathway for novel software medical devices without a predicate. It assists pathologists reviewing digital prostate biopsy slides by flagging areas suspicious for cancer. Clinical performance: a 7.3% improvement in cancer detection, a 70% reduction in false negatives, a 24% reduction in false positives compared to unassisted review. The pathologist retains final judgement. It is an adjunct tool operating in clinical practice, not a pilot.

Manufacturing. Pfizer applied AI and machine learning to PAXLOVID production. Per its 2022 annual report: a 67% reduction in cycle time for a critical manufacturing step and 20,000 additional doses per batch. The same infrastructure was applied to clinical trial data quality checks, accelerating review by 50% across more than half of all Pfizer trials. These are self-reported corporate figures, not peer-reviewed outcomes — but they are specific, attributed, and public.

Pharmacovigilance. Adverse event detection — identifying safety signals in real-world data after a drug is approved — is one of the most data-intensive activities in pharma. AbbVie published a peer-reviewed pilot in 2024 validating a machine learning model for signal detection across two products. This is pilot-scale, not enterprise deployment at volume, but it represents where the serious investment is going across the major players.

What failure looks like
#

Watson for Oncology is the most documented failure in pharma AI, and the most instructive — because the problems were not technical. This was not recent: IBM launched Watson for Oncology commercially in 2015, the major failures became public between 2017 and 2018, and Watson Health was sold off in 2022. The case is worth revisiting because the failure modes it exposed are still the failure modes that sink pharma AI projects today.

IBM trained the system on synthetic hypothetical cases generated by a small number of specialists at Memorial Sloan Kettering Cancer Center. Not real patient records. The system was then deployed globally, across hospitals in China, India, Thailand, and South Korea, for conditions where MSKCC’s US-centric protocols had no applicable data. An independent peer-reviewed study at a Chinese hospital, published in 2018, found 12% concordance between Watson’s recommendations and local oncologist practice for gastric cancer — a disease common in China, barely present in Watson’s training set. IBM’s own internal documents from 2017, later obtained by STAT News, described the system’s recommendations as “often inaccurate” and identified specific examples of unsafe treatment suggestions, including recommending a drug contraindicated in patients with the exact condition they presented with.

MD Anderson Cancer Center spent $62 million — $39 million to IBM, $23 million to PwC — on a Watson-based system that never treated a single patient. The project was terminated in September 2016 after roughly four years and a damning government audit. The failure was not that AI cannot support clinical decisions. It is that the system was trained for a population it was never designed for, deployed without external validation, and sold before the fundamental data problems were resolved.

The contrast with Insilico and BenevolentAI is direct: in both successful cases, the AI was given a clearly scoped task — identify a target, identify a candidate — with well-specified data and a defined validation pathway. Watson had none of that structure.

Why pharma is harder than most sectors
#

The regulatory environment adds a layer of complexity with no equivalent outside healthcare. A company deploying AI in a European clinical trial in 2026 must satisfy four overlapping frameworks simultaneously: GxP requirements (credibility assessment, audit trail, human oversight), the EU AI Act (high-risk classification for medical AI components in devices, compliance deadline August 2026), GDPR Article 22 (legal prohibition on solely automated decisions with significant individual effects), and MDR/IVDR if the AI is device-adjacent. No integration point exists across these four. Companies must run four compliance workstreams in parallel, against four sets of documentation requirements, with four different regulatory bodies.

FDA’s January 2025 guidance replaced traditional software validation with a “credibility assessment” model: trust in a model’s performance must be proportionate to its context of use, defined before development begins. What “sufficient credibility” means in practice remains company-defined. Every submission is currently setting precedent.

The practical implication: where you start matters more in pharma than in most industries. Manufacturing and pharmacovigilance carry the lowest regulatory burden and the highest data quality. Clinical decision support — the Watson use case — carries the highest regulatory risk and requires the most rigorous validation pathway. This does not mean avoiding it. It means it is the hardest possible entry point, and the Watson evidence shows what happens when you enter there without the foundations.

Poland’s first-mover gap
#

Poland is one of Europe’s top five countries for clinical trial volume, with enrolment timelines among the fastest on the continent and costs 15–20% below Western European comparators. Phase III trial allocation in Central and Eastern Europe runs above 60% through Poland. No Polish pharma company has published an AI validation case study for a regulated trial context. URPL, Poland’s medicines regulator, has issued no AI-specific guidance. The absence of public documentation does not mean absence of activity — Polish companies are private and publish little. What it means is that there is no regulatory precedent, no public benchmark, and no reference implementation for sponsors running AI-assisted work on Polish sites. The organisation that publishes the first validated approach for this context will have a durable commercial advantage in a market that is actively looking for one.

Briefing
#

Enterprise AI is still in its experimental era
#

A survey of 123 senior operators and executives by Operator Collective, a venture firm focused on enterprise AI, found that 90% of respondents have adopted general-use chatbots, but integration into actual business workflows is moving far slower. Fewer than half answered a question about return on investment, and of those who did, 40% said they have not established ROI metrics. The researchers noted that the absence of a response was itself a response. 32% named time as the biggest implementation barrier, citing the pace of change in available tools. The picture: adoption is broad and shallow, integration is narrow and deep only for AI-native companies.

Deloitte: AI governance readiness is at 30%
#

Deloitte’s State of AI 2026 report (released 4 March) puts governance readiness at 30% across surveyed enterprises — below technical infrastructure readiness (43%), data management readiness (40%), and significantly below access to AI tools (60% of employees). Only 25% of organisations have converted 40% or more of their AI pilots into production systems, though more than half expect to cross that threshold within months. The sharpest finding: 74% of organisations plan to deploy autonomous AI agents within the next two years, but only 21% report having adequate governance in place for those systems. The gap between deployment intent and governance readiness is largest exactly where the stakes are highest.

‘Silent failure at scale’ — when AI does exactly what you told it to do
#

CNBC documented two enterprise AI failures that did not involve malfunction in any traditional sense. A beverage manufacturer’s AI-driven production system failed to recognise its own products after the company introduced new holiday labels, interpreting the unfamiliar packaging as an error signal and continuously triggering additional production runs — producing several hundred thousand excess cans before anyone noticed. In a separate case, a customer-service AI agent began approving refunds outside policy guidelines after a customer persuaded it to grant one and left a positive review; the system then optimised for positive reviews rather than refund policy. “These systems are doing exactly what you told them to do, not just what you meant,” said CBTS CISO John Bruggeman. Both failures were silent, accumulated over time, and became visible only when the damage was already at scale.

Questions for leadership
#

1. Does the August 2026 EU AI Act deadline apply to you — and have you checked? Polish companies operating in healthcare, medical devices, or clinical trials are subject to the same high-risk classification rules as companies in Frankfurt or Amsterdam. URPL has published no AI-specific guidance, which means there is no Polish regulatory shortcut. If your legal team has not completed an AI Act classification analysis for your healthcare AI systems, you are not approaching the deadline — you are already behind it.

2. Were your AI systems trained on Polish patients, or on someone else’s? Watson’s failure was geographic: a model trained on patients from the Upper East Side of Manhattan performed at 12% concordance in Chinese gastric cancer cases. Polish patient populations — demographics, prevalent conditions, NFZ reimbursement protocols, drug availability — differ from Western European or US baselines. If you are deploying an AI system in a Polish clinical or diagnostic context, the question is not just whether it was validated, but where and on whom. A validated model is not a portable model.

3. Is your RODO Article 22 human oversight real or formal? Article 22 of GDPR prohibits solely automated decisions with significant effects on individuals — in healthcare, that means any AI system making or materially influencing clinical decisions requires documented human review. UODO has not yet published enforcement cases in this area, but the legal obligation exists now. A clinician who has access to override a system but never does is not human oversight. The override must be competent, documented, and capable of real intervention — not a checkbox on a form.

4. When a regulator asks — what will you show them? No Polish pharma or healthcare company has publicly documented an AI validation approach for a regulated context. When URPL inspectors, EU AI Act notified bodies, or a trial sponsor’s audit team eventually asks to see your AI governance documentation, you will either produce it or not. The organisations building that documentation now — methodology, audit trail, validation records — are not just managing compliance risk. They are creating the benchmark that others will be compared against.

To the next issue,

Krzysztof

Sources: Insilico Medicine: Phase 2a Results, INS018_055 (November 2024) · Insilico Medicine: Nature Biotechnology Publication (March 2024) · AlphaFold Database, Google DeepMind (2025) · Nobel Prize in Chemistry 2024 — AlphaFold · BenevolentAI: Baricitinib hypothesis, The Lancet (February 2020) · Paige Prostate FDA De Novo Authorisation DEN200080 (September 2021) · Pfizer 2022 Annual Report — AI in Manufacturing · AbbVie Pharmacovigilance AI Pilot (PMC11133112, 2024) · FDA Draft Guidance: AI for Regulatory Decision-Making (January 2025) · EMA/FDA Joint Guiding Principles (January 2026) · Petrie-Flom Center: EU Medical AI Regulation (5 March 2026) · EFPIA Clinical Trial Ecosystem in Europe (2024) · EU AI Act, Annex III and Article 6 · STAT News: IBM Watson’s Unsafe Treatment Recommendations (July 2018) · Zhou et al.: Watson Concordance Study, The Oncologist (2018) · UT System Audit: MD Anderson Watson Project (2017)

What AI is changing in drug development#

What failure looks like#

Why pharma is harder than most sectors#

Poland’s first-mover gap#

Briefing#

Enterprise AI is still in its experimental era#

Deloitte: AI governance readiness is at 30%#

‘Silent failure at scale’ — when AI does exactly what you told it to do#

Questions for leadership#