Scout: Sandbox-Per-Task Primitives Compared — GKE Agent Sandbox, Cloudflare Dynamic Workflows, Claude Code Auto Mode

Summary

Three vendor primitives for sandbox-per-task agent execution shipped between May 1 and May 8, 2026, and they sit at three completely different layers of the stack. Google’s GKE Agent Sandbox is a Kubernetes-native isolation primitive built on gVisor, hitting 300 sandboxes per second at sub-second latency via warm pools — and crucially, the underlying CRD set is the upstream Apache-2.0 kubernetes-sigs/agent-sandbox subproject, not a GKE-only product. Cloudflare’s Dynamic Workflows is a ~300-line MIT-licensed TypeScript library on top of Dynamic Workers (V8 isolates) that lets durable workflow code be loaded per-tenant at runtime — durable execution that follows the tenant rather than being baked into the deployment. Anthropic’s Claude Code Auto Mode, shipped a second time at Code w/ Claude 2026, sandboxes per-action rather than per-process: a Sonnet 4.6 transcript classifier guards every tool call, with a 17% false-negative rate on real overeager actions and a 0.4% false-positive rate after the full pipeline. The three primitives are not substitutes. GKE Agent Sandbox isolates the process running the agent’s code, Dynamic Workflows isolates the durable plan the agent wrote, and Auto Mode isolates the individual action the agent wants to take. For practitioner teams building agent infrastructure in mid-2026, the right question isn’t which one of the three to pick — it’s which combination of them to stack, and Auto Mode is the layer that composes with either of the other two rather than competing with them.

Key Findings

1. The three primitives sandbox at three different layers — they compose, they don’t substitute

The cleanest practitioner read is what each primitive treats as the unit of isolation:

Vendor	Unit of isolation	Mechanism	Cold-start latency	Pricing model
GKE Agent Sandbox	A pod running the agent’s untrusted code	gVisor kernel isolation, optional Kata Containers, on Kubernetes	Sub-second via warm pools; ~1s without	Free add-on on GKE; pay for underlying compute
Cloudflare Dynamic Workflows	A durable workflow definition that differs per tenant	V8 isolates via Worker Loader; ~MB memory, single-digit ms boot	Single-digit milliseconds	Active-CPU pricing on Containers/Workers tier
Anthropic Claude Code Auto Mode	A single tool call before it runs	Sonnet 4.6 transcript classifier; two-stage filter; user-message + tool-call payload only	One classifier call (~hundreds of ms in practice)	Bundled with Claude Team/Enterprise plans

GKE Agent Sandbox is the layer where you decide what process the untrusted code runs in. Dynamic Workflows is the layer where you decide whose durable plan executes next. Auto Mode is the layer where you decide whether this specific tool call gets to run at all. The three answer different questions, and a team running adversarial-input agents at production scale on Claude needs to answer all three — host isolation, tenant isolation, and per-action permission — rather than picking one.

The illustrative stack: GKE Agent Sandbox pods running per-session containers, each container running Dynamic Workflows-style per-tenant code, each tenant’s agent calling tools through Auto Mode classifiers. That’s the full defense-in-depth shape that no single vendor ships end-to-end. The week’s announcements amount to three different layers of the same stack arriving simultaneously, not three competing solutions to the same problem.

2. GKE Agent Sandbox is the upstream Kubernetes primitive — Google’s version is the managed offering on top

The detail that doesn’t survive the headline is that the kubernetes-sigs/agent-sandbox subproject is an Apache-2.0-licensed Kubernetes SIG Apps project, launched at KubeCon Atlanta in November 2025. The four CRDs — Sandbox, SandboxTemplate, SandboxClaim, and SandboxWarmPool — install on any conformant Kubernetes cluster via standard manifests, and the project documentation explicitly emphasises vendor neutrality: “Interoperability is core to Agent Sandbox with a standardized Kubernetes API that fully decouples the execution layer from the underlying isolation technology.”

What Google ships at Cloud Next ‘26 is the managed version on GKE, with three GKE-specific accelerants on top of the upstream primitive:

Managed gVisor on GKE Sandbox — Google’s gVisor is the same kernel-isolation technology securing Gemini, operationally mature at Google-scale.
Pod Snapshots — a GKE-exclusive Preview feature in 1.34.1+ that captures both memory state and filesystem, taking pod startup “from minutes down to seconds” per Google’s framing. Pod Snapshots is not in the upstream project.
Axion price-performance — Google claims up to 30% better price-performance on Axion compared to other hyperscale clouds for sandbox workloads.

The headline performance numbers attach to the managed offering, not the upstream code. The 300 sandboxes per second at sub-second latency figure is achievable on GKE with warm pools and Axion; a self-hosted Agent Sandbox install on a non-Google Kubernetes distribution gets the same CRDs and the same gVisor isolation, but the cold-start and price-performance numbers are unlikely to match without comparable engineering investment. The verbatim Lovable production-use quote from Fabian Hedin via InfoQ is specifically about the GKE flavour: “GKE’s cutting-edge sandboxing capabilities allow us to reliably scale to hundreds of secure sandboxes per second, ensuring we can seamlessly empower builders, even during massive, unpredictable demand.” Lovable’s production workload runs 200,000+ AI-generated projects daily on it.

The practitioner implication is real. If a team is committed to Kubernetes-native infrastructure but not committed to Google Cloud, Agent Sandbox is a credible upstream-OSS choice that installs on EKS, AKS, or on-premises clusters — with the understanding that Pod Snapshots, the warm-pool-on-Axion price-performance, and the production-tuning that produces 300 isolates per second are GKE-flavour accelerants the team has to either replicate or do without. If the team is already on GKE, the managed offering is the path with the least operational work. If the team isn’t on GKE and isn’t on Kubernetes at all, neither version of Agent Sandbox is the right primitive — and that’s where Cloudflare’s offering becomes the answer.

3. Cloudflare Dynamic Workflows is the smallest interesting OSS primitive of the three — but it sandboxes a different thing

Dynamic Workflows is @cloudflare/dynamic-workflows on npm, MIT-licensed, roughly 300 lines of TypeScript, released May 1. The architectural insight it bakes in is that durable execution can be decoupled from deployment — the workflow code doesn’t have to be the same code at deploy time and at execution time. Cloudflare’s framing for what this enables, verbatim per InfoQ’s coverage: “Say you’re building an app platform where the AI writes TypeScript for every tenant. Say you’re running a CI/CD product where each repository has its own pipeline. Say you’re using an agents SDK where each agent writes its own durable plan.”

The mechanism is straightforward. A Worker Loader sits between the Workflows engine and tenant code. When a tenant calls env.WORKFLOWS.create(...), the call goes through the Worker Loader, which tags the new Workflow instance with metadata identifying which Dynamic Worker to load. The Workflows engine persists the metadata-bearing payload, and when the engine wakes up to run the next step, it reloads the right tenant’s Dynamic Worker through the loader. The compact framing from the Cloudflare blog post: tenants write standard Workflows code without awareness of dynamic dispatch; the wrapping happens at the binding layer, not inside the tenant code.

What this primitive is not is a kernel-level isolation boundary in the GKE Agent Sandbox sense. Dynamic Workers run as V8 isolates — the same JavaScript engine underneath Chrome — with single-digit-millisecond boot times and a few megabytes of memory per isolate, around 100x faster and 10x-100x more memory efficient than a typical container. Cloudflare explicitly acknowledges the security tradeoff and the defence-in-depth response: “security bugs in V8 are more common than security bugs in typical hypervisors,” per the Dynamic Workers post. That’s why Cloudflare layers V8 isolates with a custom second-layer sandbox, hardware-MPK protection, and per-isolate process boundaries. For genuinely adversarial agent code — code that may try to escape its execution environment — V8 isolates are a different threat-model bet than gVisor or Firecracker. For per-tenant code where the threat is mostly cross-tenant data leakage rather than kernel escape, V8 isolates plus Cloudflare’s defence-in-depth is the lightest sandbox the practitioner has access to.

Heavy workloads route through Cloudflare Sandboxes (container-based, full OS, snapshot-based wake) which composes naturally with Dynamic Workflows: the lightweight per-tenant code lives in an isolate, and when a step needs docker build or a Postgres dependency, the workflow hops out to a Sandbox for that step. Sandboxes themselves run on active-CPU pricing — billing in 10-millisecond increments at $0.000020 per vCPU-second and $0.0000025 per GiB-second, with idle time effectively free.

For practitioner teams: Dynamic Workflows is the primitive that sandboxes the durable plan, not the process. If the agent’s threat model is “tenant A’s agent must not be able to read or modify tenant B’s workflow state,” the metadata-tagged Worker Loader pattern is purpose-built for that. If the threat is “the agent’s generated code could try to escape its execution environment,” Dynamic Workflows by itself isn’t the answer — pair it with Sandboxes, or pick GKE Agent Sandbox for the underlying execution layer.

4. Auto Mode sandboxes per-action — and its honest 17% miss rate is the most-criticised number of the three

Anthropic’s Claude Code Auto Mode, first shipped as a research preview in March 2026 on Claude Team and now rolling to Enterprise and API, is the layer of the stack the other two don’t address. Where Agent Sandbox isolates the host and Dynamic Workflows isolates the tenant, Auto Mode isolates the individual action. Before any tool call runs, a transcript classifier on Sonnet 4.6 evaluates the proposed call against approximately twenty default block categories — destroy-and-exfiltrate operations, security-degradation moves, trust-boundary violations, bypass-and-shared-impact actions — and either permits, blocks, or escalates to a human.

The architecture has two notable design decisions worth pulling apart. First, the classifier deliberately strips assistant messages and tool outputs before evaluation: it sees only user messages and the executable tool-call payload itself. Per Anthropic’s engineering post, this prevents the agent from generating “persuasive rationalizations” that would justify dangerous actions in its visible reasoning. Second, the classifier runs a two-stage filter — a fast single-token yes/no first stage, with chain-of-thought reasoning only on actions Stage 1 flags as uncertain. The optimisation is for cost and latency: most actions clear Stage 1 cheaply, and only the borderline cases pay the CoT cost.

The numbers Anthropic publishes are unusual in their transparency. After the full two-stage pipeline: a 0.4% false-positive rate and a 17% false-negative rate on 52 curated real overeager actions, with a 5.7% false-negative rate on synthetic exfiltration prompts (n=1000). Anthropic’s own framing on whether the 17% is acceptable, verbatim: “Whether 17% is acceptable depends on what you’re comparing against. If you are running --dangerously-skip-permissions, this is a substantial improvement.” That hedge is doing real work — Auto Mode isn’t positioned as a hardened security boundary, it’s positioned as a middle path between manual approval (where users approve 93% of prompts, enough that the approval mechanism degrades into rubber-stamping) and the no-prompts mode that offers no protection at all. The pragmatic framing in Zvi Mowshowitz’s coverage reads similarly: the 17% number is acceptable if you compare against users who’d otherwise be running --dangerously-skip-permissions and unacceptable if you compare against careful human review on critical infrastructure.

The three-tier permission framework around the classifier is the operationally important structure: Tier 1 is a built-in allowlist of read-only tools (file reads, code navigation) that bypass the classifier entirely; Tier 2 is in-project file operations that proceed without classifier involvement because edits are “reviewable via version control”; Tier 3 is shell commands, web fetches, external integrations, and out-of-project filesystem operations — these are the actions that reach the classifier. The denial backstop is symmetric: three consecutive denials or twenty total denials within a session and the agent halts and escalates to a human; in headless mode, the process terminates.

The criticism Auto Mode has drawn is sharp and worth surfacing. Backslash Security’s analysis frames the architectural concern around the classifier’s deliberately-restricted visibility: the guard sees only what the architecture exposes, and that exposure has been narrowed by design to prevent the agent’s reasoning from being a persuasion surface. Sophisticated prompt injection that manipulates the primary model into emitting clean-looking tool calls could pass the classifier — the deliberate reasoning-blindness that prevents persuasion is the same property that prevents detection of upstream manipulation. Backslash’s overall framing is that one model guarding another is preferable to a fatigued human but that probabilistic security needs deterministic controls around it — consistent with Anthropic’s own positioning of Auto Mode as one layer in a defence-in-depth stack, not a standalone security boundary.

Practitioner read: Auto Mode is the primitive that composes with the other two rather than competing. If the agent’s executing code in a GKE Agent Sandbox pod or a Cloudflare Sandbox, Auto Mode is still the right primitive for “should this rm -rf actually run inside the sandbox?” The 17% FNR is the right number to plan around — not “trust the classifier to catch adversarial inputs” but “deploy the classifier alongside isolation that limits the blast radius when the classifier misses one in six.”

5. The pricing models reveal who each vendor thinks the buyer is

The unit each primitive prices in is the most direct signal of who the vendor expects to deploy it.

GKE Agent Sandbox prices at $0 — the add-on itself is free, and you pay for the underlying GKE resources (vCPU-hour, memory-hour, etc.) the sandbox pods consume. The implicit buyer is the platform team that already buys GKE capacity and treats sandbox-as-a-feature rather than sandbox-as-a-line-item. Comparable workloads on the AWS hyperscaler stack — AgentCore Runtime at $0.0895/vCPU-hour and $0.00945/GB-hour, with Firecracker microVM per session and an eight-hour max compute lifetime — price the sandbox as a separately-billable line item. AWS’s pricing structure makes the sandbox cost legible; GKE’s makes it invisible inside the cluster bill.
Cloudflare’s active-CPU pricing is the most agent-shaped unit of the three primitive families. $0.000020 per vCPU-second, with idle time effectively free, matches the agent’s actual work pattern: wake briefly, do a step, sleep. For agents that idle frequently — which describes most production agent workloads, where sessions spend 30-70% of time waiting on LLM responses or tool calls — Cloudflare’s TCO can come in dramatically below per-session or per-vCPU-hour pricing. AWS’s AgentCore Runtime also charges only for active CPU within the session, but the session itself has overhead the Cloudflare model doesn’t.
Anthropic bundles Auto Mode with the plan — there’s no separate line item; Auto Mode is a feature of Claude Team, Enterprise, and API plans. The implicit buyer is the developer team already paying for Claude Code seats. The token cost of running the classifier itself isn’t separately billed (it falls within the user’s existing Claude usage), which means the classifier’s two-stage filter design is doing real work on Anthropic’s cost side rather than the customer’s.

The cross-vendor billing-unit comparison is starker now that all three have shipped. A team running an idle-heavy multi-tenant agent platform pays effectively zero for runtime on Cloudflare while sessions sleep, pays for the warm-pool capacity it provisions on GKE Agent Sandbox even when idle, and pays Anthropic for the Claude Code plan regardless of how much Auto Mode is exercised. None of these is a strictly better unit; they target different deployment shapes.

6. The week’s announcements collectively normalise the four-pillar harness pattern

What the three primitives have in common at the architectural level matters more than what they differ on. All three implement some version of the four-pillar architecture GitHub formalised in its defense-in-depth piece the same week: Isolation, Constrained Execution, Controlled Outputs, Observability. GKE Agent Sandbox provides Isolation as a Kubernetes-native primitive. Dynamic Workflows provides Constrained Execution as a per-tenant code-loading mechanism. Auto Mode provides Constrained Execution and Controlled Outputs at the action-call level. None of the three ships the Observability piece as a first-class primitive — GKE relies on Cloud Observability with OTEL export, Cloudflare on Workers logging and AI Gateway telemetry, Anthropic on the existing Claude Cowork OTEL integration.

When the same architectural decomposition ships at three different layers from three vendors in the same seven days, it is no longer a research pattern. The platform layer has converged on sandbox-per-task as the default; the convergence is on the shape, not on a specific implementation. The architectural primitives are now portable enough that a team can mix and match — GKE Agent Sandbox underneath, Dynamic Workflows as the per-tenant code layer (if the team is willing to bridge between Kubernetes and Cloudflare’s Workers tier, which is non-trivial), Auto Mode at the action layer if the agent is Claude Code. The full-stack composition is real, even if no single vendor sells it end-to-end.

The Code w/ Claude 2026 context is worth naming briefly. The Anthropic-SpaceX compute deal announced the same week — 300+ megawatts of new capacity, over 220,000 NVIDIA GPUs, within the month — doubled Claude Code’s five-hour rate limits across Pro, Max, Team, and Enterprise tiers. That’s not a primitive in the sandbox-per-task sense, but it’s the enabling capacity story for Auto Mode. A classifier-gated long-running coding session that was rate-limited out after an hour on the old caps is now feasible; the practitioner pattern of “Auto Mode running for hours on a large task” is contingent on the compute capacity being there. Whether the deal’s dollar value lives at the $5B/year figure circulating in trade-press analyst commentary — Latent Space explicitly hedges this number as derived from secondary Twitter commentary rather than Anthropic’s own announcement — is unsettled. What’s confirmed is the megawattage and GPU count.

Practical Implications

Pick by layer, then compose

The decision shape is different from the registry-and-runtime comparison earlier this spring. There, the offerings were largely substitutes — pick one runtime, one registry. Here, the offerings stack:

Process isolation — pick from GKE Agent Sandbox (Kubernetes-native, gVisor, open-source upstream), AWS AgentCore Runtime (Firecracker microVM, paid per vCPU-hour, hyperscaler-bundled), or Cloudflare Sandboxes (container-based, active-CPU pricing). Tradeoff is along the kernel-isolation strength axis on one end and the idle-cost axis on the other.
Per-tenant code — Dynamic Workflows is the closest thing in the OSS landscape to a purpose-built primitive for “different agent, different durable plan, same runtime substrate.” If the team is on Workers, it’s a 300-line vendor-in choice. If the team is on Kubernetes, the equivalent pattern requires more bespoke wiring on top of Agent Sandbox.
Per-action permission — Auto Mode is the only first-party shipped option for Claude Code. For teams running non-Claude agents, the equivalent pattern (separately-modelled action classifier with low FNR on overeager actions) is something to build, not buy.

When to reach for which primitive

You’re already on GKE, you run untrusted code (LLM-generated or agent-authored), and you need kernel-level isolation: GKE Agent Sandbox is the right call. The managed gVisor + Pod Snapshots + warm pool combination is the lightest-weight path to sub-second sandbox creation at production scale. The Lovable production deployment at 200,000+ projects daily is the strongest reference for what this primitive looks like at scale.

You’re on Kubernetes but not on GKE, and you want vendor-neutral OSS: Install the kubernetes-sigs/agent-sandbox upstream project. You get the same CRDs and gVisor isolation. You don’t get Pod Snapshots, and the warm-pool performance numbers will depend on your underlying compute. The vendor neutrality is real but the production-tuning work is yours.

You’re running multi-tenant agent code where each tenant’s plan should be isolated from the others, and idle cost matters: Cloudflare Dynamic Workflows is the lightest primitive on offer, period. The MIT licence and ~300-line footprint make this small enough to vendor into your own platform if you want the pattern without the Cloudflare dependency, though the underlying Worker Loader machinery is Cloudflare-specific. If your threat model is “tenant A’s agent must not see tenant B’s plan state,” Dynamic Workflows is purpose-built; if your threat model includes “the tenant’s agent might try to escape its execution sandbox,” pair it with Cloudflare Sandboxes or with Agent Sandbox on Kubernetes.

You’re shipping Claude Code at scale and approval fatigue is your bottleneck: Turn on Auto Mode. The 93%-approval-rate finding makes the manual-prompt mechanism degrade into rubber-stamping anyway; Auto Mode’s classifier catches the 7% that were genuinely worth blocking and a meaningful (though not all) fraction of the overeager actions the human approver wouldn’t have caught in fatigue mode. Don’t treat Auto Mode as a security boundary on its own — the 17% FNR makes that framing unsound. Treat it as a fatigue-reduction layer that composes with sandbox-level isolation for blast-radius control.

You’re running adversarial-input agents in production (untrusted tenant code, agent-authored shell commands on customer data, autonomous coding sessions on production repos): Stack the layers. GKE Agent Sandbox or AWS AgentCore Runtime for process isolation. Per-tenant code routing for tenant separation. Auto Mode (or equivalent) for per-action permission. The point of the three primitives shipping the same week is that the full-stack composition is now achievable with off-the-shelf pieces; don’t ship adversarial-input agents with only one layer.

The build-vs-buy calculus has shifted, but not entirely

A platform team in mid-2026 that’s been building its own per-task sandbox primitive for the last twelve months now has three credible buy-side answers across three different platform layers, and an Apache-2.0 upstream OSS option for the process-isolation layer that didn’t exist last quarter. The math has changed enough that “we’re building this ourselves” needs a real defence — what does the bespoke version do that the off-the-shelf primitives don’t?

Three plausible answers survive: a non-Kubernetes runtime substrate (the SaaS team running on bare metal or on Nomad), a model-coupling that none of the three vendors target (a specialised inference engine that needs a specific harness shape), or a regulated-industry context where the off-the-shelf primitives don’t meet a compliance constraint. Outside those, the bespoke-build defence is now harder to mount than it was in Q1.

Open Questions

Does Auto Mode’s 17% FNR hold up under sustained adversarial pressure? The published number is from 52 curated real overeager actions and 1,000 synthetic exfiltration prompts. The published Permission Gate paper that Edition 10 covered showed FNR degrading by roughly five times under pressure, with a third of state-changing actions falling outside the classifier’s evaluation scope. That paper used a similar architecture but was tested on a synthetic adversarial workload. Whether the production Auto Mode classifier shipped on Sonnet 4.6 closes those scope gaps — particularly for in-project file edits (Tier 2 in the current framework, which bypasses the classifier) — is the most operationally consequential open question. A patient red-team campaign over a quarter is the right way to test it; the absence of such a campaign in public reporting yet is itself a data point.
Will the kubernetes-sigs/agent-sandbox project sustain non-Google contributions? The project is upstream Apache-2.0 and vendor-neutral on paper. In practice, the maintainer footprint and most of the design momentum appear to come from Google. Whether other hyperscalers (AWS, Azure) build Agent Sandbox conformance into their managed Kubernetes offerings, or whether they continue to ship their own primitives (AgentCore Runtime on AWS, no first-party Azure equivalent yet), determines whether Agent Sandbox becomes the Kubernetes-native standard or the Google-flavour standard. The next two quarters of EKS and AKS release notes are the right place to watch.
Is Dynamic Workflows’ Worker Loader pattern portable enough to vendor outside Cloudflare? The 300-line MIT-licensed library lives on top of Cloudflare-specific primitives — Dynamic Workers, the Worker Loader binding, the Workflows engine. The architectural pattern (per-tenant code loaded at runtime through a metadata-routing layer) is portable in principle. Whether a team running on AWS Lambda or on Kubernetes can lift the pattern without rebuilding the entire Worker Loader substrate is unclear. The first non-Cloudflare implementation of “Dynamic Workflows-style per-tenant durable execution” is the right signal to watch.
What does observability look like across the stack? Each of the three primitives ships some telemetry but none of them ships a unified observability story for the full stack. A team that runs GKE Agent Sandbox + Dynamic Workflows + Auto Mode needs to ship traces from all three into a single backend, and the OTEL bridge work for that isn’t yet productised. The opportunity for a third-party observability vendor (Datadog, Honeycomb, Grafana Cloud) to ship the cross-primitive view first is real, but no one has done it as of mid-May 2026.
How does the per-action permission pattern generalise outside Claude Code? Auto Mode is Claude-specific — the classifier runs on Sonnet 4.6, the integration is with Claude Code’s tool-call surface, the rate-limit dependencies are Anthropic’s. A team running Gemini-based or OpenAI-based agents doesn’t have a turnkey equivalent. The pattern is reproducible — separate guard model, two-stage filter, restricted-context classifier — but the engineering work to ship it is non-trivial. Whether a vendor-neutral OSS implementation of the per-action permission pattern emerges (something like an MCP-layer guard) is an open architectural question for non-Claude shops.
Where do the AWS and Azure equivalents land? AWS AgentCore Runtime ships per-session Firecracker microVMs and is the most architecturally complete hyperscaler offering for process isolation, but it doesn’t have a public-OSS upstream equivalent to Agent Sandbox. Microsoft has no first-party Azure equivalent shipped yet for the sandbox-per-task primitive (Agent 365 is governance, not runtime). The competitive shape of the next two quarters depends on whether AWS contributes to Agent Sandbox upstream, ships its own competing Kubernetes-CRD primitive, or doubles down on the AgentCore Runtime managed-service model. Microsoft’s positioning is the most open of the four hyperscalers’ agent strategies.

Sources

Cloudflare Ships Dynamic Workflows, Bringing Durable Execution to Per-Tenant and Per-Agent Code — InfoQ, 2026-05-09. Source for the ~300 LOC TypeScript figure, MIT licensing, Worker Loader mechanism, and the verbatim “Say you’re building an app platform…” use-case framing attributed to Cloudflare engineering.
Introducing Dynamic Workflows: durable execution that follows the tenant — Cloudflare Blog, 2026-05-01. First-party announcement. Source for the Worker Loader routing pattern, RPC-into-Worker-Loader mechanism, and the composition-with-Sandboxes pattern.
Run Workflows inside Dynamic Workers with the @cloudflare/dynamic-workflows library · Changelog — Cloudflare Developer Changelog. Source for the May 1 2026 release date and npm package name.
Sandboxing AI agents, 100x faster — Cloudflare Blog. Source for the V8 isolate boot time (single-digit ms, ~MB memory), the 100x-faster-than-containers claim, and Cloudflare’s own framing on V8 security tradeoffs (“security bugs in V8 are more common than security bugs in typical hypervisors”).
Agents have their own computers with Sandboxes GA — Cloudflare Blog. Source for Sandboxes’ container-based isolation, the 30s-from-scratch / 2s-from-snapshot performance numbers, and the active-CPU pricing model.
Pricing · Cloudflare Containers docs — Cloudflare Developer Docs. Source for the $0.000020/vCPU-second, $0.0000025/GiB-second, and $0.00000007/GB-second rates on Cloudflare Containers (which Sandboxes inherits).
Google Announces GKE Agent Sandbox and Hypercluster at Next ‘26, Positioning Kubernetes as AI Agent — InfoQ, 2026-05-07. Source for the 300 sandboxes/sec at sub-second latency claim, the 30% better price-performance on Axion claim, the Drew Bradstock / Gari Singh “Kubernetes has rapidly become the operating system for the AI era” quote, the Lovable / Fabian Hedin production-use quote, and the Alex Gkiouros analyst quote on hypercluster blast radius.
Agentic AI on Kubernetes and GKE — Google Cloud Blog. First-party source for the gVisor + Kata Containers isolation framing, the managed gVisor on GKE Sandbox detail, the pre-warmed pools enabling sub-second latency, and the Pod Snapshots feature.
GitHub — kubernetes-sigs/agent-sandbox — Kubernetes SIGs. Source for the Apache-2.0 license, the four CRDs (Sandbox, SandboxTemplate, SandboxClaim, SandboxWarmPool), and the vendor-neutral / cross-Kubernetes-distribution portability claim.
Agent Sandbox — project home — Kubernetes SIG Apps. Source for the verbatim interoperability claim (“Interoperability is core to Agent Sandbox with a standardized Kubernetes API that fully decouples the execution layer from the underlying isolation technology”) and the gVisor/Kata backend support framing.
Running Agents on Kubernetes with Agent Sandbox — Kubernetes Blog. Source for the SandboxWarmPool cold-start framing (“Starting a new pod adds about a second of overhead”), the four-CRD enumeration, and the standardised-Kubernetes-API portability claim.
Unleashing autonomous AI agents: Why Kubernetes needs a new standard for agent execution — Google Open Source Blog, November 2025. Source for the KubeCon Atlanta November 2025 launch as a SIG Apps subproject.
About GKE Agent Sandbox | Google Cloud Documentation — Google Cloud Docs. Source for the official GKE Agent Sandbox definition, the “no extra charge in GKE, with GKE pricing applying to the resources” pricing model, and the Python SDK / ADK / LangChain integration framing.
Save and restore Agent Sandbox environments with Pod snapshots | Google Cloud Documentation — Google Cloud Docs. Source for Pod Snapshots being a GKE-exclusive Preview feature requiring 1.34.1-gke.3084001+, and the memory-plus-filesystem state capture mechanism.
What’s new in GKE at Next 26 — Google Cloud Blog. Source for the GA status of GKE Agent Sandbox and the Lovable “200,000+ new projects daily” production reference. Also source for the hypercluster details (256,000 nodes / 1M chips / Titanium Intelligence Enclave / Private GA).
Claude Code auto mode: a safer way to skip permissions — Anthropic Engineering. First-party source for the 17% FNR / 0.4% FPR / 93% manual-approval-rate numbers, the two-stage classifier architecture on Sonnet 4.6, the user-message + tool-call-payload-only restricted-context decision, the three-tier permission framework (Tier 1/2/3), the denial backstop (3 consecutive / 20 total), and the verbatim “Whether 17% is acceptable depends on what you’re comparing against” hedge.
Auto mode for Claude Code — Anthropic / Claude.com. Source for the Team / Enterprise / API availability rollout, the Sonnet 4.6 + Opus 4.6 model requirement, and the verbatim “middle path that lets you run longer tasks with fewer interruptions” positioning.
Inside Claude Code Auto Mode: Anthropic’s Autonomous Coding System with Human Approval Gates — InfoQ, 2026-05-05. Source for the input/execution layer framing, the subagent outbound/return check mechanism, and the verbatim red-spinner quote attributed to Ankit Kalluraya.
Higher usage limits for Claude and a compute deal with SpaceX — Anthropic, 2026-05-06. Source for the 300+ MW / 220,000+ NVIDIA GPUs / Colossus 1 deal terms, the doubled five-hour rate limits across Pro / Max / Team / Enterprise, and the removal of the peak-hours reduction for Pro and Max.
Claude Code Auto Mode: Can One Model Guard Another? — Backslash Security. Source for the practitioner critique of Auto Mode’s reasoning-blindness tradeoff and the broader framing that probabilistic security needs deterministic controls layered around it.
Claude Code, Codex and Agentic Coding #7: Auto Mode — Zvi Mowshowitz Substack. Source for the framing that Auto Mode is suitable for users who would otherwise run --dangerously-skip-permissions, and the practitioner takeaway that file writes within project directories bypass the classifier.
Anthropic-SpaceXai’s 300MW/$5B/yr deal for Colossus I — Latent Space, 2026-05-07. Source for the explicit hedging that the $5B/year figure is “some estimate” derived from secondary Twitter analyst commentary rather than Anthropic’s own announcement, and the framing of Anthropic productising parts of the harness.
Firecracker – The Virtualization Technology Behind AWS Lambda and Bedrock AgentCore Runtime — Dev.to. Source for the AgentCore Runtime per-session microVM detail and the 30-70% session-idle figure used in the active-CPU pricing comparison.
Amazon Bedrock AgentCore Pricing — AWS. Source for the $0.0895/vCPU-hour and $0.00945/GB-hour rates used in the cross-vendor pricing comparison.
Amazon Bedrock AgentCore — AWS. Source for the eight-hour max-session-compute-lifetime detail in the Firecracker-microVM-per-session framing.
Managed Agent Runtimes — Anthropic, Cloudflare, AWS, Microsoft, Google Compared — Grimoire (prior scout). Comparative-runtime context for the substitution-versus-composition framing.