Scout Reports — Page 3 | The Artificer's Grimoire

May 31, 2026 24 sources

Scout: The Agent Substrate Attack Surface — GitHub, Grafana, and the Shai-Hulud Return

The May 2026 supply-chain wave (Nx Console / TeamPCP, TanStack, AntV Shai-Hulud) and the practitioner audit it forces for coding-agent infrastructure

Coding-agent harnesses concentrate exactly the credentials this campaign harvests — IDE extensions, npm install-time scripts, OIDC publish tokens, and now ~/.claude config files — so the defensible hardening posture has moved since the last supply-chain reckoning

May 31, 2026 21 sources

Scout: The Coding-Agent Methodology Curriculum — Failure Modes, Sensors, Attention Architecture, and the Harness Moves That Connect Them

The mid-2026 methodology layer for teams operating coding-agent workflows: Fowler's stabilized Vibe Coding definition, Böckeler's guides/sensors instrumentation, Osmani's orchestration-tax attention argument, the anti-vibe practitioner critique cluster, and the harness-engineering moves that turn the critique into operational practice

Coding-agent productivity is now bottlenecked on methodology, not capability. The named failure modes, the sensors that catch them, and the attention-architecture patterns are converging into a teachable curriculum — and teams that haven't assembled it are paying the orchestration tax blind

May 31, 2026 18 sources

Scout: Three MCP Perimeter Architectures Arrive Within Weeks — IAM Tagging vs. Egress Tunnel vs. Six-Layer Platform

The three MCP gateway/perimeter architectures that landed in Q2 2026 — AWS's IAM-context-key tagging, Anthropic's outbound-only MCP Tunnels, and Cloudflare's six-layer agent platform — and how a team choosing among them should weigh identity model, audit surface, egress posture, MCP-spec coverage, and lock-in

The MCP gateway question stopped being 'do we need one?' (answered last quarter) and became 'which perimeter model do we standardize on?' Three vendors now ship architecturally incompatible answers — two at GA, one in research preview. Every team running remote MCP servers against sensitive resources in mid-2026 makes this call before its first agent reaches a regulated workload.

May 31, 2026 21 sources

Scout: Silent Vendor Patching as the New Normal — Disclosure Cadence Across Claude Code, Codex, Gemini CLI, Cursor, and Copilot

How coding-agent vendors actually disclose sandbox, harness, and network-allowlist vulnerabilities — CVE assignment, release-note transparency, and customer notification across Claude Code, Codex CLI, Gemini CLI, Cursor, GitHub Copilot, and OpenClaw — plus the independent-researcher channels practitioners should treat as primary advisory feeds.

When a harness vendor patches a sandbox or allowlist bypass without a CVE, an advisory, or a release-note flag, the entire downstream vulnerability-management apparatus enterprises rely on — scanner updates, SBOM flags, version-pinning alerts — goes blind. Teams running coding agents in production cannot rotate credentials or audit egress against a vulnerability they were never told existed. The disclosure cadence is now a procurement input on par with the isolation primitive itself.

May 18, 2026 22 sources

Scout: Agent-as-Principal — Cloudflare + Stripe and the Commercial-Identity Layer

Cloudflare/Stripe's hybrid agent-commerce primitive — Stripe attests to the user's identity, the customer is billed, but the provider issues credentials directly to the agent — moves a meaningful distance past the delegate model without reaching full agent-as-principal, and the governance gap it opens

Teams designing 2026 H2 production agent deployments where the agent needs to act commercially (provision accounts, register domains, start subscriptions, deploy code) now have a vendor primitive for it; the audit, incident-response, and policy patterns the primitive doesn't yet supply are what practitioners have to build

May 18, 2026 21 sources

Scout: Agent-Economics Post-Meter — What Programmatic Claude Actually Costs After June 15

The economics of programmatic Claude usage after Anthropic's 2026-06-15 split of Agent SDK, headless Claude Code, GitHub Actions, and third-party agent tools onto a dedicated monthly credit pool billed at API rates — and what that does to the optimisation playbook, switching-cost math, and build-vs-buy posture for teams running coding agents at scale

Effective 2026-06-15, the cross-subsidy that made flat-rate Claude subscriptions cheap for programmatic agentic workloads ends. Every Agent SDK pipeline, headless `claude -p` invocation, Claude Code GitHub Action, and third-party agent tool now competes for a non-rolling $20–$200 monthly credit metered at API list price; Routines remain a structural exception, drawing down subscription usage rather than the new pool. The optimisation playbook that was a nice-to-have for chat workloads — prompt caching, batch APIs, model routing — becomes operationally critical, and the switching-cost calculation against OpenAI's two-months-free Codex offer is the planning event of the next thirty days.

May 18, 2026 15 sources

Scout: Harness-Escape Patterns — ExploitGym, Ona, Cymulate, and Antigravity Compared

Four coding-agent harness-escape disclosures published over the past three months — ExploitGym (UC Berkeley + Anthropic + OpenAI + Google), Ona's Claude Code denylist-and-bubblewrap escape, Cymulate's unpatched Gemini CLI filesystem-isolation and OAuth-theft findings, and Pillar Security's Antigravity sandbox-escape RCE — read together as one threat model. What's shared, what's harness-specific, where container-style defenses fail to transfer, and what mitigations exist today.

Four independent disclosures from four research groups, against four different vendor products, all describe the same structural failure: the harness layer where the coding agent runs treats the agent like a deterministic workload it can contain, and the agent is in fact a general-purpose reasoner that solves containment as one obstacle among many. Every team running coding agents in production needs a working threat model that accommodates this pattern, and the vendor primitives that ship by default don't yet do that.

May 18, 2026 16 sources

Scout: Scheduled and Background Coding Agents Compared — Routines, Symphony, Copilot, ADK

Vendor comparison of scheduled / background coding-agent platforms for 2026 H2

Four vendors shipped or productised background-agent capabilities inside one fortnight; teams choosing where to host scheduled agent workflows in 2026 H2 need an orchestration / governance / pricing / ecosystem map that distinguishes the four offers without taking vendor framing at face value

May 10, 2026 23 sources

Scout: Validating Agent-Authored Code in CI When There Is No Oracle

Patterns for continuous integration on agent-authored code when the reference output is non-deterministic — what to measure, how to weight semantic versus structural validators, how to fail builds without producing flaky-test fatigue, and how to keep the eval signal calibrated as both the codebase and the agent drift.

Agent-authored pull requests are now the majority of CI traffic at several large vendors. Standard CI was built around deterministic graders against fixed reference outputs; the workload it now has to gate doesn't fit that shape. Teams shipping agent-authored code at meaningful volume need a working eval rubric before the pipeline either rubber-stamps everything or gets disabled out of fatigue.

May 10, 2026 26 sources

Scout: ClaudeBleed — Trust-Boundary Postmortem and the Execution-Context Authorisation Pattern

LayerX's ClaudeBleed disclosure against Anthropic's Claude Chrome extension — the execution-origin-vs-execution-context failure mode, the partial patch, and what an execution-context-aware authorisation pattern looks like for agent products shipping into shared host environments

ClaudeBleed is the first widely-reported takeover-class vulnerability in an Anthropic-shipped consumer agent surface. The failure mode — trusting where code runs rather than who is running it — is the trust-boundary mistake every team shipping agent UI into browsers, IDEs, terminals, or OS shells inherits whether they realise it or not. The Chrome-extension specifics matter less than the authorisation pattern they reveal.

May 10, 2026 26 sources

Scout: Operationalising AI-Assisted Vulnerability Discovery — What Mozilla's Mythos Pipeline Actually Requires

The operational pipeline behind Mozilla's 271-vulnerability Firefox 150 release — agentic harness, sanitizer-driven validation, ephemeral-VM parallelism, deduplication and triage integration, and the remediation-pipeline capacity question for security teams trying to replicate it

Capability is no longer the bottleneck — operations is. Mozilla's pipeline turns a frontier model into a working defensive primitive, and the operational shape it requires (sanitizer-build success signal, ephemeral-VM parallelism, second-stage grader model, deduplicated bug lifecycle integration, two-engineer-per-patch remediation discipline) is the actual blueprint other security teams will be asked to reproduce in 2026-Q3 and beyond.

May 10, 2026 26 sources

Scout: Sandbox-Per-Task Primitives Compared — GKE Agent Sandbox, Cloudflare Dynamic Workflows, Claude Code Auto Mode

Three sandbox-per-task primitives shipped the same week at three different layers — Google's GKE Agent Sandbox (kernel-isolated pods), Cloudflare's Dynamic Workflows (per-tenant durable code in V8 isolates), and Anthropic's Claude Code Auto Mode (per-action permission classifier). Comparing what each actually isolates, where the failure modes live, and which primitive to reach for under which constraints.

Sandbox-per-task is now the dominant production pattern for running untrusted agent code, and three of the four credible vendor implementations landed in seven days. The build-vs-buy calculus and the layering decision (which combination of these primitives stacks?) are the load-bearing platform-architecture choices for any team running agents at scale in mid-2026.