Skip to main content

Issue #44 — The Shifting Specification

·2010 words·10 mins

Dear Reader,

Anthropic has made two silent changes to Claude Opus 4.6 this year, each of which reduces compute per query. In early February, it introduced what it calls “adaptive thinking” — a mechanism that lets the model decide for itself how much reasoning to apply per query, replacing a fixed budget the user could set. In the weeks that followed, it also lowered the default effort setting on the same model. Neither change went through regular release-notes channels.

Users noticed. On 2 April, Stella Laurenzo, AMD’s Director of AI, opened a bug report on the Claude Code GitHub repository titled “Claude Code is unusable for complex engineering tasks with the Feb updates” (issue #42796). More than two hundred users signed on. Dimitris Papailiopoulos, Principal Research Manager at Microsoft Research, told Fortune (14 April): “I set effort to max, yet it’s extremely sloppy, ignores instructions, and repeats mistakes.” The user request for maximum reasoning did not produce maximum reasoning. The setting had become advisory, and the default level it referenced had been lowered.

Boris Cherny, the Anthropic executive who leads Claude Code, characterised the adaptive-thinking system to Fortune as one that “allows the model to decide how much reasoning to apply to a given task rather than using a fixed budget.” The rationale was compute cost. Anthropic is rationing, and one mechanism of rationing is giving the model its own discretion over how much reasoning to invest — including when the user has explicitly asked for more.

For enterprises running Claude in production workflows, this is a specific problem. Your AI system was validated against a particular performance envelope. The vendor changed the envelope without notifying you. The system running in production is no longer the system you signed off on, and the deviation shows up first as operational drift — outputs degrading where nothing else in the workflow changed — and second as a compliance exposure under Article 26 of the AI Act. Your contract probably does not catch either, and your audit trail almost certainly does not — both were designed for software where the system’s computational behaviour did not drift between patches.

What Anthropic just demonstrated
#

Silent capability shifts of this kind have no equivalent in mature enterprise software categories. Database vendors publish release notes. Operating systems version their changes. Even API deprecations come with months of notice and migration guides. With frontier models, the vendor can modify the behaviour of the system the customer is consuming without a version bump, a changelog entry, or a contract-triggered notification.

Neither of Anthropic’s changes was a model upgrade. Opus 4.6 remained Opus 4.6. What shifted was the compute allocation — the mechanism by which the model decides reasoning depth, and the default level of effort applied before the customer’s instruction kicks in. Papailiopoulos’s account is the tell: effort set to maximum, output sloppy. A regulated process that passed validation under the previous architecture is now running under a different one, with the deployer carrying the responsibility for outcomes they no longer fully control.

The reason: money. Anthropic’s CFO stated publicly (Reuters Breakingviews, 11 March) that the company had spent over $10 billion on training models and serving user queries to generate roughly $5 billion of cumulative revenue. Bank of America analysts estimated in March that Anthropic could pay hyperscale cloud providers up to $6.4 billion in 2026 through revenue-share agreements tied to reselling Claude, against $1.9 billion in 2025 (Forbes, 25 March). OpenAI’s chief revenue officer, Denise Dresser, sent an internal memo on 13 April asserting that around $8 billion of Anthropic’s reported $30 billion annualised run-rate is a question of revenue recognition — Anthropic reports the hyperscaler-share figure as its own revenue; OpenAI nets the Microsoft share out before reporting (CNBC, 13 April). Whichever of these numbers survives closer scrutiny, the direction is consistent. The unit economics of frontier AI inference at scale are unresolved, and the vendors are looking for ways to close the gap without provoking customer rebellion.

Raising prices provokes customer rebellion. Rationing visibly — through queues, throttling, or outages — also provokes it. The third option — adjusting capability per unit of usage without declaring it — is the least visible to users and the most commercially sustainable for the vendor. It is also the one with regulatory implications that nobody is discussing.

What this does to your business process
#

The operational exposure is larger than the regulatory one, and it hits hardest in agentic workflows. An agent makes decisions in sequence. Each step conditions the next. A small reduction in reasoning depth at step two changes the tool selection at step three, which feeds different input into step five. By the time an output reaches a customer or a downstream system, it is the product of a decision chain that no longer matches the one the deployer validated.

The visible symptoms are mundane. A customer-service agent that used to escalate ambiguous refund requests begins auto-approving them — the threshold for “unclear” has shifted, though nothing in the workflow definition changed. A claims-triage agent that previously caught duplicate submissions starts missing them because the heuristic it runs now applies less reasoning before committing. A code-review agent that patched issues in three tool calls now makes eight, burning more tokens to reach the same answer or a worse one. None of these are detectable by typical quality monitoring tools. Most customers are looking for uptime, not quality drift.

Reproducibility goes with it. If you ask the agent the same question six months apart, the answers will differ — not because the question changed, and not because context drifted, but because the system was reconfigured by the vendor in the interim. Any forensic debugging exercise — why did this output happen, why did this decision go this way — now carries an invisible variable. A/B testing becomes impossible against a moving baseline. Regression detection requires the deployer to maintain their own shadow evaluations on fixed datasets, running continuously, flagging deviations. That is a discipline almost no enterprise outside frontier AI labs has instrumented.

There is a simpler way to frame this. No operations team would run a production workflow on a database vendor that silently retuned its query optimiser. No finance function would run payroll on a banking API that adjusted its interest calculation without notification. The answer to whether this is acceptable is obvious for every substrate enterprises have relied on for thirty years. For frontier model APIs, it is happening now, and most enterprises have not yet registered that it is happening to them.

And a compliance exposure on top
#

The AI Act puts this on the deployer. Article 26(1) requires deployers of high-risk systems to use them “in accordance with the instructions for use” supplied by the vendor. Article 26(5) obliges the deployer to monitor operation against those instructions. Article 14(4)(a), on human oversight, requires overseers to detect “unexpected performance” — which presupposes a notion of expected performance. When the vendor modifies what the system does without telling the deployer, the deployer’s reference for expected is out of date, and the ability to detect deviation is structurally compromised.

For institutions under Polish sectoral supervision — banks and insurers under KNF oversight, for example — this gets sharper. Article 27 requires deployers of credit-scoring and life or health insurance risk-pricing systems to conduct a Fundamental Rights Impact Assessment that includes “a description of the deployer’s processes in which the high-risk AI system will be used.” The description rests on a specific model configuration. When that configuration shifts unannounced, the FRIA describes a system that no longer exists.

What your contracts do not cover
#

Standard SaaS change-notification language covers API versions, endpoint deprecations, and pricing adjustments. It typically does not cover configuration changes to the underlying model, effort or compute-budget settings, routing behaviour between model variants, or fallback logic when capacity is constrained. None of these were contractable events in prior software generations, because the behaviour of a system did not drift between patches.

Your audit trail has the same gap. The logs produced by a typical AI deployment record the request, the response, and basic metadata. They do not record the effort setting, the routing decision, or the specific model weights that produced the response. Your logs capture outputs. They do not capture the configuration that produced those outputs.

This is the exposure no standard vendor management framework is designed for. Security review covers data handling. Privacy review covers personal data. Model risk management covers statistical validation. Unannounced capability modification by the provider of a regulated system sits outside all three.

Briefing
#

Salesforce and Microsoft patch AI-agent data-leak vulnerabilities — and disagree about who owns the fix

Capsule Security published research on 15 April describing two prompt-injection vulnerabilities: “PipeLeak” in Salesforce Agentforce and “ShareLeak” in Microsoft Copilot (CVE-2026-21520). In the Salesforce case, an attacker could embed instructions into a public-facing lead-capture form that the agent treated as trusted — enough to extract the full lead database. Microsoft patched. Salesforce’s position: data exfiltration prevention is a configuration issue, and customers should activate human-in-the-loop oversight. Capsule CEO Naor Paz called that response “embarrassing” — “the whole thing about agents is they do things for you without you babysitting them.” The governance point is specific. Your vendor may ship default configurations that accept untrusted input as trusted instruction, and their remediation model assumes you will reconfigure. For Polish enterprises piloting Agentforce or Copilot Studio, the useful question is whether your deployment settings have been audited against known prompt-injection patterns (Dark Reading, 15 April).

US federal judge rules AI chats not protected by attorney-client privilege

In a February ruling that drew broader attention on 15 April through follow-on Reuters reporting, US District Judge Jed Rakoff ordered former GWG Holdings chair Bradley Heppner to hand over 31 Claude-generated documents prepared as part of his defence in a securities fraud case. Rakoff wrote: “No attorney-client relationship exists, or could exist, between an AI user and a platform such as Claude.” More than a dozen major US law firms have since issued client advisories warning that chatbot conversations can be subpoenaed by prosecutors and civil litigation adversaries. New York firm Sher Tremonte now includes the clause “Disclosure of privileged communications to a third-party AI platform may constitute a waiver of the attorney-client privilege” in new client contracts. The ruling is US jurisdiction; the logic travels. Polish board members and general counsel routinely draft sensitive strategy documents in public chatbots. This is a precedent that doing so strips them of the protections most users assume they retain (Reuters, 15 April).

Four questions for leadership
#

Which of your production AI processes run against a vendor-hosted frontier model, and when were those processes last re-validated against current model behaviour? If the answer to the second part is “at go-live,” your quality metrics may be out of date.

What does your contract with Anthropic, OpenAI, or Google say about notification of capability-affecting changes that are not model-version upgrades? Check the specific language. Answer: “probably not.”

If a customer complaint or a regulator asked you to reproduce a system output from six months ago, could you? If no, your Article 26 demonstrability is impaired through nothing you did.

What is your contingency budget for the scenario in which 2026 AI prices turn out to be a floor rather than a ceiling? Vendor unit economics below breakeven, compute costs climbing — it will not get cheaper.

The shift underneath
#

Enterprise AI governance frameworks assume a stable system. The frontier model market may be moving to continuous tuning by the vendor against its own cost structure.

Anthropic’s revenue growth is the headline. The quieter fact — that the fastest-growing AI company in the world has chosen to ration by adjusting capability without announcement — is the one that matters for every organisation running production workloads on its infrastructure. The compliance exposure is a subset of a broader operational problem: the system you deployed is not the one running for your customers today, and you were not told.

Stay balanced, Krzysztof Goworek