Scout: ClaudeBleed — Trust-Boundary Postmortem and the Execution-Context Authorisation Pattern

Summary

LayerX’s ClaudeBleed disclosure against Anthropic’s Claude for Chrome extension is a textbook example of the confused-deputy problem ported into the agent era. The extension’s externally_connectable manifest entry exposed its message API to the claude.ai origin without validating which extension or script was actually doing the issuing — and an attacker who can land JavaScript into the claude.ai execution origin (typically via an installed extension’s content script, including in Chrome’s MAIN world) can pass through that channel. A zero-permission attacker extension running a content script in Chrome’s MAIN world inherited the Claude extension’s full agent authority — exfiltrate files from Gmail, GitHub, and Google Drive; send mail; share documents; bypass user confirmations through DOM manipulation. Anthropic shipped a partial fix in version 1.0.70 on May 6 that added approval prompts for “standard” mode but left the underlying trust model intact; LayerX’s principal researcher reported bypassing the patch within three hours by routing through the side-panel initialisation flow that puts the extension into “privileged” mode without notifying the user. This is the second time in five months an attacker has demonstrated agent-level command injection into the same extension through an origin-trust failure — the prior incident (ShadowPrompt, December 2025) exploited a wildcard *.claude.ai allowlist combined with a DOM-XSS in an embedded CAPTCHA. The pattern that connects ClaudeBleed, ShadowPrompt, CometJacking, and the Cursor Git-hook RCE (CVE-2026-26268) is the same: agent products shipping into shared host environments are inheriting the host’s pre-existing trust model and assuming it does authorisation work the host was never designed to do. The durable answer is an execution-context-aware authorisation pattern — one where every request crossing into the agent’s privileged surface carries an authenticated claim about who is making it, validated cryptographically rather than ambiently.

Key Findings

1. The mechanism: `externally_connectable` is an allowlist, not an authentication primitive

The Claude for Chrome extension’s manifest.json lists claude.ai (and historically *.claude.ai) under externally_connectable.matches. Chrome’s official documentation is clear that this setting controls which web pages may attempt to send messages to the extension — it is not a sender authentication mechanism. The extension’s message handler still receives a MessageSender object describing the request, and the extension is responsible for inspecting sender.id, sender.url, and sender.origin to decide whether to honour the message.

The ClaudeBleed extension didn’t. Any script landed into the claude.ai execution origin via the paths enumerated below could invoke chrome.runtime.sendMessage(claudeExtensionId, { type: 'onboarding_task', prompt: '...' }) and the extension’s handler would route the request straight through to the agent, accepting an arbitrary prompt parameter as if it had come from Anthropic’s first-party UI code. Chrome’s extension model provides three ways for a third party to land JavaScript inside the claude.ai origin:

Inject a content script into claude.ai from another extension’s manifest — requires the attacker extension declare host_permissions for claude.ai. That is a non-trivial permission ask, but Chrome users routinely grant <all_urls> to general-purpose extensions.
Inject a content script in the MAIN world. This is the path LayerX demonstrated. A MAIN-world content script runs inside the page’s own JavaScript execution context — same global object, same prototype chain, no isolated world. Per Chrome’s content-script documentation, “Choosing the MAIN world means the script will share the execution environment with the host page’s JavaScript.” From the Claude extension’s perspective, a MAIN-world script on claude.ai is indistinguishable from Claude’s own first-party code.
Compromise a third-party script that already runs on the page — e.g. a CDN-hosted analytics or CAPTCHA script. This is the ShadowPrompt attack chain Koi disclosed in December 2025 against an earlier version of the same extension: a DOM-XSS in an Arkose Labs CAPTCHA component hosted on a-cdn.claude.ai paired with a then-too-permissive *.claude.ai allowlist let any website embed the vulnerable CAPTCHA in an iframe and send a postMessage payload that turned into a chrome.runtime.sendMessage to the extension.

All three paths route around the same hole: the extension does not authenticate the requester, only the page origin the requester happens to be inside. Aviad Gispan, LayerX’s principal researcher, characterised the design as “a confused deputy” — the extension carries full agent authority but cannot tell who is actually invoking it.

This is not a novel class of bug. GitHub Security Lab’s 2024 survey of browser-extension vulnerability patterns flags externally_connectable misuse explicitly: “if the extension does not check the sender, a malicious extension may be able to access any functionality that the onMessageExternal/onConnectExternal function facilitates.” The OWASP Browser Extension Vulnerabilities Cheat Sheet phrases the same rule as the categorical default: “Treat all incoming messages as untrusted input. In Service Workers, always validate sender.id to ensure the message originates from your own extension.” A USENIX Security ‘23 study of browser-extension privilege escalation found 59 such message-passing vulnerabilities across 40 extensions in the wild. The class has been on every browser-security checklist for at least a decade. What ClaudeBleed adds is the consequence: when the privileged action behind the unverified message is run an agent with the user’s logged-in credentials across Gmail, GitHub, and Google Drive, the blast radius is closer to credential exfiltration than to typical extension XSS.

2. The exploit surface: arbitrary prompts, approval looping, and DOM manipulation

LayerX’s technical writeup names the entry point as the extension’s onboarding_task message handler, which accepts an arbitrary prompt string and forwards it into the agent’s instruction stream. Four exploit paths are demonstrated in the disclosure and reproduced across SecurityWeek’s, CyberScoop’s, CSO’s, hackread’s, and CyberInsider’s coverage:

Private repository code exfiltration via the Claude-GitHub integration — the agent reads a repository’s contents and pastes them into a chat window the attacker monitors.
Google Drive document sharing to attacker-controlled accounts — the agent, instructed to share a document, was directed to act on a Share button whose label had been renamed via DOM manipulation to read “Request feedback,” disguising the action.
Email composition and dispatch via the Gmail integration — the agent sends mail under the user’s identity.
Inbox summarisation followed by exfiltration and evidence deletion — the agent reads the last several messages, emails the summary to an external address, then deletes the sent-mail record.

Two specific bypass techniques warrant naming because they generalise beyond Claude:

Approval looping. When the agent prompts for user confirmation, the attacker script repeatedly injects “Yes, proceed” responses until the confirmation flow accepts one. The flow was designed to gate destructive actions behind explicit user assent; it did not anticipate that the same injection vector that delivered the malicious prompt could also deliver the approval. Per LayerX, this technique satisfies “confirmation flows despite Claude requesting specific structured input.” The pattern is the agent-product equivalent of the DOM-based extension clickjacking Marek Tóth disclosed against eleven major password managers in August 2025, where opacity-zero overlays caused autofill UI to deposit credentials into invisible attacker forms. In both cases the issue is that whatever can place pixels and synthesise events on the privileged extension’s UI surface can satisfy its consent prompts.

Label-spoofed DOM manipulation. The attacker script rewrites button text on the page so that the agent, reasoning over the visual DOM, reads a Share button as “Request feedback” and approves an action it would otherwise have flagged. This is a different failure mode from prompt injection in the strict sense — the malicious content isn’t in the prompt stream, it’s in the DOM the agent treats as ground truth. Anthropic’s own browser-use research disclosure notes that defences must cover “hidden text, manipulated images, deceptive UI elements”; the DOM relabel is the third of those.

3. The partial patch and the “privileged mode” escape hatch

Anthropic shipped version 1.0.70 on May 6, 2026, roughly nine days after LayerX’s initial report on April 27. Per LayerX’s continued analysis carried in SecurityWeek’s coverage, the patch added “internal security checks to prevent extensions running in ‘standard’ mode from executing remote commands.” Standard mode is Anthropic’s term for the default “ask before acting” agent operating mode; privileged mode is the opt-in “act without asking” mode for users running longer autonomous sessions.

The patch’s gap, per LayerX’s follow-up writeup, is that the new approval flows were enforced only inside the standard-mode message path. The side-panel initialisation flow that puts the extension into privileged mode was not gated. An attacker script can drive the same externally_connectable channel to put the extension into privileged mode, then issue commands that bypass the standard-mode approval checks entirely. The user is never notified that the mode switch happened. Per LayerX: “The user is never notified or asked to approve the switch.”

Gispan reported bypassing the patch within roughly three hours of its release, per Cybernews’s coverage and SQ Magazine’s reporting. LayerX’s framing is that the patch addresses the symptom — the standard-mode approval UX — rather than the root cause, which is that the trust check is anchored on execution origin rather than execution context. CyberScoop reported that Anthropic declined to comment on the mitigation gap. Whether Anthropic’s longer-arc plan is to retire the externally_connectable handler entirely (LayerX reports an initial Anthropic acknowledgement suggesting that was the intent, since walked back), to harden the handler with sender authentication, or to accept the residual risk and rely on user-mode discipline, is not in the public record. Reporting on Anthropic’s longer-term remediation plan for the trust-boundary root cause remains thin.

A separate but adjacent LayerX disclosure — the Claude Desktop Extensions RCE report from February 2026, CVSS 10.0, where a malicious Google Calendar event can chain through an MCP connector to trigger code execution — is worth noting because Anthropic explicitly declined to fix it, stating the scenario “falls outside our current threat model” because the user must have intentionally installed the connectors. The DXT and Chrome-extension disclosures are technically distinct, but the editorial pattern is consistent: the Anthropic-side response treats the host platform’s existing permission model as load-bearing in places the host platform’s model wasn’t designed to support agent-level authorisation. At the time of writing no CVE has been publicly assigned to ClaudeBleed.

4. This is the recurring failure mode, not the one-off bug

ClaudeBleed isn’t the first ambient-authority compromise of an agent product shipping into a shared host. The recent record is dense enough to read as a pattern:

ShadowPrompt (Koi Security, disclosed December 26 2025, patched January 15 2026) — the same Claude Chrome extension, an earlier wildcard *.claude.ai origin allowlist, exploited via a DOM-XSS in an embedded CAPTCHA. Same shape: trust the origin, not the requester.
CometJacking (LayerX, August 2025, patched November 18 2025) — Perplexity’s Comet agentic browser parsed URL query strings as agent instructions; a single weaponised URL could direct the agent to exfiltrate Gmail and Calendar data. LayerX’s framing names the issue as authentication-bypass-by-agent-hijack: “they no longer need the user’s password — they just need to hijack the agent that is already logged in.” Perplexity initially marked the report Not Applicable before later treating it as a P1 issue.
Cursor IDE / CVE-2026-26268 (Anysphere, disclosed February 2026, patched in Cursor 2.5) — autonomous Git execution by the Cursor agent surfaced an embedded bare repository’s pre-commit hook, achieving RCE on the developer’s workstation. Per Novee Security’s writeup, “the step between ‘clone a repository’ and ‘execute attacker-controlled code’ is reduced to a single, unremarkable user action.” Same shape one rung down the stack: the IDE trusts the local filesystem; the agent, acting through the IDE, now runs untrusted-repo code with the IDE’s filesystem authority.
Comet, ChatGPT Atlas, and the broader agentic-browser cohort — independent aimultiple testing reports Atlas blocked roughly 5.8% of malicious pages and Comet roughly 7%. OpenAI itself has publicly stated that prompt injection for browser agents may never be fully solved, treating the threat as a long-running posture rather than a one-time fix.

The connective tissue across all of these is not the specific bug class. It’s the trust model. Each agent product is shipping into a host environment — Chrome, an IDE, a terminal — whose pre-existing security model assumes a different shape of caller. Chrome’s same-origin policy was designed to keep page A’s JavaScript from reading page B’s cookies, not to authenticate which extension is invoking a chat agent’s privileged interface. An IDE’s filesystem permission model was designed for the developer’s intent (“I want this tool to read my code”), not for an agent’s recursive autonomous tooling. The agent product layers on top, inheriting the host’s check primitives, and the host’s check primitives don’t carry an authenticated claim about who is asking.

The Anthropic-side counterargument is visible in Claude Code Auto Mode, shipped at the Code w/ Claude 2026 event the same week as ClaudeBleed. Auto Mode implements the harness pattern explicitly: a two-layer permission system with an input layer that inspects tool outputs for prompt injection before they enter context, and an execution layer that gates each proposed action with a fast-path safe-action filter and a slow-path classifier for ambiguous cases. The visual red-spinner approval signal is the consent UX. That architecture, applied at the Chrome-extension surface, would have prevented ClaudeBleed at the execution layer — the agent would have evaluated the proposed action before honouring it, regardless of what the message handler trusted. The gap that ClaudeBleed makes visible is that Anthropic’s harness model is implemented in some agent products and not in others; the Chrome extension’s authorisation surface predates the Auto Mode pattern Anthropic is now describing as the default.

5. What execution-context-aware authorisation actually means

The corrective for the ClaudeBleed class of bug is straightforward to describe and structurally difficult to retrofit. Three layers, all of which need to be present:

Layer 1: Authenticate the requester, not the surrounding page. Chrome’s MessageSender object carries sender.id, sender.url, and sender.origin. The hardened message handler validates sender.id against an allowlist of extension IDs that are entitled to invoke privileged operations — for an Anthropic-shipped extension, that allowlist is typically empty (no third-party extension should be entitled), and the handler should reject any external sender by default. Where messages legitimately need to come from the page’s own first-party JavaScript on claude.ai, the handler validates sender.origin against an exact-match allowlist, and treats the message as suspicious whenever sender.id indicates the request originated in a third-party extension — including third-party extensions whose content script chose to run in the MAIN world to colocate itself with the site’s own code. The MAIN-vs-isolated-world distinction applies only to extension content scripts and is not a proxy for first-party identity (a site’s own first-party JavaScript necessarily executes in the page’s main world, because there is no other world for it); the practitioner rule is to ship your own extension’s content scripts in the isolated world by default and to never treat “ran in MAIN world” or “ran on the right origin” as substitutes for sender authentication. The Chromium extension team has acknowledged that MessageSender.id and MessageSender.url can be spoofed by a compromised renderer, so the validation should treat both as inputs to a defence-in-depth chain rather than as standalone proofs.

Layer 2: Authenticate the request itself, not just the channel. The structural problem with origin-based trust is that the channel can be ambient — any code with execution rights on the origin gets access. The harder primitive is to require that every privileged request carry a credential that only legitimate first-party code can produce. Concretely, this looks like: a per-session nonce or signed request token issued by the extension’s service worker to its own UI code at initialisation time, included in every subsequent message, validated server-side against the issuance record. An attacker script in MAIN world has access to the page’s DOM and globals but not to the extension’s isolated-world service-worker state where the nonce lives, unless the extension itself leaks it. This is the same primitive that CSRF tokens give server-rendered web apps; it has not been the default in extension messaging because the threat model was historically “untrusted webpage,” not “second extension co-located on the same origin.”

Layer 3: Gate every privileged action behind an authorisation check that runs after the message is parsed. Per-message authentication still leaves the question of whether this request from this authenticated requester is currently allowed. The harness pattern Anthropic is shipping in Auto Mode is one shape of this — input-side and execution-side classifiers that evaluate each action against safety policy. A simpler shape is a static capability gate: this message handler is allowed to issue prompts but not to write to local storage; this one can request page-context information but not invoke tools; this one can prompt the user but not auto-execute on the user’s behalf. The static capability gate is cheaper to implement than the classifier-driven harness and addresses the ClaudeBleed shape directly — the onboarding_task handler should never have been permitted to forward arbitrary prompt content into the agent’s instruction stream regardless of who sent it.

The three layers compound. Layer 1 alone is what most extension teams ship and is what OWASP and Chrome’s docs recommend. Layer 2 is the harder ask and the one that catches the renderer-spoof case, the compromised-third-party-script case (ShadowPrompt), and the MAIN-world content-script case (ClaudeBleed). Layer 3 is the layer that survives even when Layers 1 and 2 fail — the request authenticates correctly but should still not be honoured because the privileged action is out of scope. Auto Mode is the closest production implementation of Layer 3 in the agent-product ecosystem today.

The retrofit cost is real and worth naming. Adding Layer 1 to an existing extension is hours of work. Adding Layer 2 is days to weeks because the protocol between the extension’s UI code and its service worker has to change, and any in-flight session has to migrate. Adding Layer 3 is a multi-quarter effort because it requires building or integrating the classifier infrastructure, training data, and approval-UX surface that Auto Mode took Anthropic over a year to ship. For a team shipping an agent product into a host environment today, the realistic ask is Layer 1 immediately, Layer 2 within a release cycle, and a roadmap to Layer 3.

6. The shared-host environments are not just browsers

The same authorisation gap shows up across every shared host an agent product can ship into. The list is worth naming explicitly because the failure mode generalises:

Browser extensions and agentic browsers. ClaudeBleed, ShadowPrompt, CometJacking are the recent examples. The Brave Software team’s analysis of Comet extends the threat model to any case where the agent treats page content as instruction input.
Agent IDEs. Cursor’s CVE-2026-26268 is the on-record case. The Git-hook escape is structurally identical to the browser-extension case: the IDE trusts the local filesystem, the agent’s authority within the IDE is ambient, and a malicious repository smuggles execution rights through the gap between “agent reads file” and “filesystem executes hook.”
Terminals and shells. Any agent that issues shell commands inherits the shell’s full authority over the user’s user account. Anthropic’s Auto Mode permission gates are the most-developed answer here, but Auto Mode is a runtime layer above the shell, not a property of the shell itself. The shell’s threat model — “the user types commands” — does not anticipate “the user’s agent types commands the user did not see.”
Operating-system-level agent surfaces. The Claude Desktop Extensions disclosure is the on-record example; an MCP server running without sandboxing inherits whatever filesystem and credential access the user’s account has. Anthropic’s published position is that this surface is the user’s responsibility to lock down. That position will hold until the first wide-deployment incident in which it doesn’t.

In each case the agent product is doing what the host environment told it it could do, and the host environment’s authorisation primitives are not capturing the property the agent’s threat model needs: who, specifically, is causing this action, with what authentic claim of intent. The host’s “this script can execute here” is not the same as the agent’s “this request is from the user.” The architectural durable answer is for the agent product to layer its own authentication and authorisation on top of the host’s, not to assume the host’s primitives substitute.

Practical Implications

For teams shipping agent products into browser extensions

Audit externally_connectable immediately. If your extension’s manifest lists any external origins, your message handler is the attack surface. Validate sender.id against an explicit allowlist (empty by default), validate sender.origin separately, and never accept user-controlled instruction text from a message handler routed through externally_connectable without an additional per-request authentication primitive. The default position should be that no external sender is entitled to invoke a privileged message handler; explicit exceptions only.
Treat MAIN-world content scripts as adversarial. Even your own first-party code should run in the isolated world by default. If first-party UI needs to interact with the page’s JavaScript context, do it through postMessage with explicit origin validation, and do not let MAIN-world script messages cross into the privileged extension surface without re-authentication. Chrome’s own documentation warns that the MAIN world shares the host page’s globals; any other extension can join that party.
Bind a request token to every privileged message. The extension’s service worker mints a session-scoped nonce, hands it to the first-party UI code at initialisation, and validates it on every privileged inbound message. This is straightforward to add, costs roughly a day of engineering, and closes the entire class of attack where a co-located script imitates first-party traffic. It does not close the case where the first-party UI itself is XSS’d (ShadowPrompt’s shape), but it makes the ClaudeBleed shape impossible.
Gate destructive actions behind authorisation that is independent of the request channel. Auto Mode’s two-layer permission system is the high-end answer; the low-end answer is a static capability whitelist enforced at the handler boundary. The onboarding_task handler should never have been able to forward arbitrary prompt content into the agent’s instruction stream. The static gate would have caught that regardless of who sent the message.
Assume approval looping and DOM relabelling are part of the threat model. Your consent UX has to be robust to a co-located adversary that can synthesise events and rewrite DOM. The realistic posture: confirmation prompts read state the user has explicitly typed (not state populated from the page), confirmation actions require an out-of-DOM signal (extension-managed keyboard chord, side-panel-only mouse click on a coordinate the page cannot influence), and any action that mutates state across the user’s trusted account boundaries gets a confirmation prompt the page cannot pre-fill.

For teams shipping agent products into IDEs, terminals, or OS shells

Inventory every ambient authority the agent inherits from the host. Filesystem access, shell access, network egress, credential cache access, sibling-process IPC. Each of those is a primitive the agent did not explicitly request — it inherits them from the host environment’s permission model. Each is a potential ClaudeBleed-shape surface. The Cursor CVE-2026-26268 case is the on-record example for agent IDEs; the same audit applied to terminal-resident agents will surface analogous gaps.
Distinguish “agent operating on the user’s authority” from “agent acting at the user’s direction.” Those are not the same thing. The agent has the user’s authority; the user has not necessarily directed every action the agent takes. The harness pattern’s value is making the difference visible at the point of action. Without a harness layer, the agent’s authority is ambient and every action it takes is implicitly user-directed — which is the threat model ClaudeBleed exploits.
Make the consent surface non-ambient. A confirmation prompt that lives in the same DOM the agent is reasoning over is structurally compromisable. The robust shape is consent rendered in a host-controlled surface — Chrome’s native permission dialog, an OS-level notification with cryptographic provenance, an IDE’s privileged sidebar that the agent cannot direct events to. Anthropic’s Auto Mode visual red spinner is in the right direction; whether the side-panel-rendered approval surface is genuinely non-ambient with respect to the page’s DOM is one of the open questions ClaudeBleed makes explicit.

For teams integrating with agent products built by someone else

Treat the agent product’s threat model as a stated artifact, not as an implementation guarantee. Anthropic publishes a threat model for the Claude for Chrome extension; LayerX’s disclosure demonstrates that the threat model and the implementation are not the same surface. The realistic posture for an integrator is: read the published threat model, then assume there’s a delta between it and what the product actually enforces, then design the integration so a breach of the delta is recoverable rather than catastrophic. For a Gmail-Claude integration, that means scoped OAuth tokens with revocation, audit logging of agent-initiated actions, and an out-of-band recovery path that doesn’t depend on the agent product’s own consent UX being honest.
Plan for the second-order disclosure pattern. ClaudeBleed is the second time in five months the same agent surface has been compromised through an origin-trust gap. The pattern of “patch the immediate bug, leave the root cause” suggests the next disclosure is months, not years, away. Any integration architecture should be auditable on the assumption that the agent product’s authentication surface will fail again before it is durably hardened.

Open Questions

Will Anthropic retire the externally_connectable handler, harden it with sender authentication, or accept the residual risk? LayerX’s reading of Anthropic’s initial response suggested the handler would be removed; the May 6 patch did not remove it. Reporting on Anthropic’s longer-arc remediation plan remains thin.
Will the Auto Mode harness pattern apply to the Chrome extension’s surface? Auto Mode is shipping in Claude Code today. Whether the input-layer and execution-layer permission classes extend to the browser-extension execution path or remain Code-Mode-only is not in the public record.
What does a non-ambient consent UX look like for agent products? Chrome’s native permission dialogs are robust because they’re rendered by the browser chrome, not by page-controlled JavaScript. The equivalent primitive for agent products inside browsers, IDEs, and terminals is not a solved engineering problem. The Anthropic side-panel approval UX is a partial answer; the full answer needs the consent surface to be unreachable by the page or extension being approved.
Will the agent-product ecosystem converge on a shared standard for execution-context-aware authorisation, or will every vendor solve it independently? MCP, A2A, and the rest of the protocol stack do not currently specify a sender-authentication primitive at the agent-product layer. The shape of an answer might look like signed agent action requests in the protocol body, but no vendor has shipped one.
How widely is the MAIN-world content-script vector understood as adversarial? Chrome’s docs warn about it, OWASP catalogues it, and the GitHub Security Lab study quantified its prevalence in 2022. ClaudeBleed is the first widely-reported case in the agent-product cohort. Whether the rest of the cohort (OpenAI’s ChatGPT browser extensions, Google’s Gemini surfaces, Perplexity’s Comet) is hardened against the same vector is an open question for any team auditing the cohort against the ClaudeBleed shape.