Scout: Managed Agent Runtimes — Anthropic, Cloudflare, AWS, Microsoft, Google Compared

Summary

The April 19 scout on agent registries mapped the catalog layer; the runtime layer underneath it now has five serious offerings, all shipped or repriced inside the same April 2026 window. Anthropic Managed Agents (public beta on the Claude Platform, $0.08/session-hour on top of token costs) sells a model-coupled runtime where the harness, sandbox, and observability ride along with Claude inference. Cloudflare Sandboxes and the enterprise MCP reference architecture hit GA together as the neutral-runtime story — bring-your-own-model via Workers AI and AI Gateway, credential injection by egress proxy, and active-CPU pricing instead of session-hours. AWS Bedrock AgentCore Runtime is the most architecturally complete of the bundles (microVM-per-session, eight-hour max compute lifetime, separately priced Identity / Gateway / Memory / Observability), Google’s Gemini Enterprise Agent Platform pairs an Agent Engine runtime at $0.0864/vCPU-hour with gVisor-isolated Agent Sandbox, and Microsoft Agent 365 ($15/user/month or bundled in M365 E7 at $99, GA May 1) explicitly positions itself as a governance plane that does not run agents — Copilot Studio and Microsoft Foundry handle execution and bill separately. The decision shape is sharper than for registries: identity is still the lock-in floor, but the unbundling of runtime, harness, and sandbox is much further along, and “neutral runtime + neutral gateway + your-choice model” is now a credible production pattern for the first time.

Key Findings

1. Five Offerings, Three Architectural Bets

After normalizing the marketing, three distinct architectural positions on what a managed runtime is show up across the five offerings:

Vendor	Architectural bet	Sandbox isolation	Model coupling	Pricing primitive
Anthropic Managed Agents	The model vendor sells the harness	Per-session sandbox; credentials held outside (Anthropic Engineering)	Claude-only (Opus 4.6, Sonnet, Mythos when GA)	$0.08 per session-hour + token costs (Finout)
AWS Bedrock AgentCore	Hyperscaler sells the full stack as composable parts	Firecracker microVM per session, ~8-hour max (AWS Docs)	Model-agnostic (Bedrock, OpenAI, Gemini, Claude on Bedrock, BYO) (AWS FAQs)	$0.0895/vCPU-hour + $0.00945/GB-hour, billed second
Google Gemini Enterprise Agent Platform	Hyperscaler sells the platform as the unified surface	gVisor-isolated Agent Sandbox on GKE (Google Cloud Blog)	Gemini-first; Claude and open weights via Model Garden	$0.0864/vCPU-hour + $0.0090/GB-hour + $0.25/1k events
Microsoft Agent 365	Identity provider sells governance, not runtime	None — Agent 365 doesn’t execute agents (SAMexpert)	Whatever Copilot Studio or Foundry runs	$15/user/month per agent-using human; build/run separately metered
Cloudflare Sandboxes + MCP Reference Arch	Neutral middle tier sells runtime, not model	Per-name persistent sandbox; sleeps when idle (Cloudflare Blog)	Bring-your-own via Workers AI + AI Gateway (70+ models, 12+ providers)	$0.00002 per active vCPU-second; pay only for cycles consumed

The model-coupled bet (Anthropic) and the neutral-runtime bet (Cloudflare) are the cleanest endpoints. The three hyperscalers occupy a middle position — model-agnostic in theory, gravity-locked to their identity and billing systems in practice. Only Microsoft is upfront about not selling a runtime: Agent 365 is the registry-and-governance piece and assumes the actual execution lives in Copilot Studio (capacity packs, $200/month per 25,000 credits per the Copilot Studio billing docs) or Microsoft Foundry (Azure-metered consumption).

2. Sandbox Isolation Is Now Table Stakes — but the Implementations Diverge

Every offering except Microsoft Agent 365 ships a per-session isolation primitive. The Mythos sandbox-escape disclosure from Apr 24 2026 makes this no longer optional, and the field is converging fast — but on three different mechanisms:

AWS uses Firecracker microVMs for AgentCore Runtime, AgentCore Code Interpreter, and AgentCore Browser. The strict one-session-one-microVM model terminates the VM at session end, with KVM virtualization plus seccomp filters, cgroups, namespaces, and privilege dropping. Eight-hour default max compute lifetime per session. This is the most defensible isolation story of the five for adversarial-agent threat models — Firecracker is the same primitive AWS uses under Lambda, and the operational maturity shows.
Google uses gVisor-isolated Agent Sandbox on GKE. Google’s positioning emphasizes “up to 300 sandboxes per second per cluster” and Axion N4A price-performance. gVisor’s user-space syscall interception is a different security tradeoff than Firecracker’s hardware virtualization — historically slightly weaker against kernel-level escape research, though the practical attack surface for an agent-driven adversary is more about credential exfiltration than kernel escape.
Cloudflare runs Sandboxes as long-lived containers addressed by name, sleeping when idle and waking on request. The snapshot-based session recovery (“30 seconds from scratch but only 2 seconds from snapshot”) is the most developer-friendly of the five — agents can resume across days rather than restart. The isolation primitive is V8 isolates plus container boundaries, lighter than Firecracker but battle-tested by Cloudflare Workers’ production scale.
Anthropic Managed Agents virtualizes the whole brain/hands split. The Anthropic engineering writeup describes containers as “cattle, not pets” with the harness leaving the container and credentials held outside it. The actual isolation mechanism isn’t disclosed in operational detail, which is a gap — practitioners running long-horizon Claude agents on internal data should ask about the mechanism explicitly before committing.
Microsoft Agent 365 ships no execution sandbox. That’s by design; the agent runs wherever Copilot Studio or Foundry put it, and Agent 365’s job — per Microsoft’s own positioning — is governance via Entra and Defender. For Microsoft-shop teams, this means the sandbox question gets answered by whichever Foundry-on-Azure or Copilot Studio runtime they pick, not by Agent 365 itself.

The practitioner read: Firecracker microVM (AWS) and gVisor (Google) are the two best-defensible isolation primitives for agents that may have adversarial properties — both ship from operators who run them at hyperscaler scale. Cloudflare’s container-plus-isolate model is the most developer-friendly and cheapest at idle, with isolation tradeoffs that are reasonable for non-adversarial workloads. Anthropic’s model is the least transparent on isolation specifics. The arxiv April 2026 containment paper is the right companion read here — it argues for explicit threat modeling against the agent itself, not just against its outputs.

3. Identity Is Where Lock-In Lives, but the Five Offerings Decouple Differently

Like with registries, identity is the deepest lock-in axis. The shape is different at the runtime layer:

AWS AgentCore Identity is the most complete: workload identities for agents, OAuth brokering against downstream services, and explicit pricing ($0.010 per 1,000 token/API-key requests for non-AWS resources, no surcharge when used through Runtime or Gateway). Identity ties the runtime to AWS IAM as the floor.
Microsoft’s lock-in is the deepest precisely because identity is the product. Agent 365 charges $15/user/month for the right to manage agents as Entra identities alongside humans. Per SAMexpert’s licensing analysis, the GA release covers “agents acting on behalf of licensed users using delegated permissions.” Autonomous agents with independent identities remain in preview, and licensing for them is undetermined — a non-trivial pricing risk for any team designing toward a large autonomous-agent population.
Google couples to Google Cloud IAM plus per-agent OAuth, with VPC Service Controls, Model Armor, and CMEK adding perimeter controls. Tenancy is the GCP project. The Agent Identity feature in Gemini Enterprise issues “unique cryptographic ID” per agent, which is genuinely portable in principle — though practitioner reports on whether that ID survives a project move are still thin.
Anthropic Managed Agents handles auth differently — the engineering writeup describes credentials kept outside the sandbox, with the harness “never made aware of any credentials” and OAuth tokens stored in a secure vault accessed via proxy. The MindStudio walkthrough confirms the platform handles “token exchange, refresh cycles, and user consent flows” through built-in OAuth support. This is good security architecture but doesn’t answer the identity-provider question — which IdP authorizes the agent’s access to the customer’s downstream systems is left to the customer’s MCP and OAuth setup.
Cloudflare’s identity story rides on Cloudflare Access for authentication (SSO, MFA, device posture per the enterprise MCP architecture). This is a real identity layer if Cloudflare Access is already the customer’s reverse proxy; for teams whose IdP is Okta or Entra, Access is an additional integration rather than a savings.

The lock-in math: AWS AgentCore Identity and Microsoft Agent 365 are the stickiest. Cloudflare’s identity coupling is light (you can rip out Cloudflare Access without losing your sandboxes). Anthropic punts the question to the customer’s existing identity infrastructure, which is honest but shifts the work back. Google sits in the middle.

4. Observability Has Effectively Standardized on OpenTelemetry — but the Surface Where Traces Live Diverges

The most surprising convergence in the data is observability. Four of the five offerings emit OTLP-compatible telemetry by default:

AWS AgentCore Runtime auto-instruments via OTEL — sessions, traces, spans for every tool call and LLM invocation, surfaced through CloudWatch GenAI Observability, exportable to Datadog, New Relic, and other OTLP-aware backends.
Anthropic Managed Agents emits SSE streams capturable by OTEL collectors, with Claude Cowork’s OpenTelemetry support extending to MCP invocations (server name, tool name, parameters, success/failure, execution time).
Google’s Gemini Enterprise Agent Engine surfaces traces through Google Cloud Observability with OTEL export.
Cloudflare’s AI Gateway provides per-user token monitoring and cost controls; the broader Sandboxes telemetry uses standard Workers logging plus the Cloudflare Gateway shadow-MCP detection for unauthorized server discovery.

The lone exception is Microsoft, where observability splits across Defender for Cloud Apps, Purview audit, and whichever Azure Monitor namespace the underlying Foundry or Copilot Studio runtime emits to. Practical consequence: a multi-cloud agent deployment can ship a single OTEL pipeline to one backend (Datadog, Honeycomb, Grafana Cloud) and get coherent traces across AWS, Google, Anthropic, and Cloudflare runtimes; integrating Microsoft Agent 365’s telemetry into that same pipeline takes meaningful additional work.

This is the most underappreciated story in the runtime landscape. The tooling for understanding what the agent did has standardized faster than the tooling for running the agent in the first place. For any team running agents in more than one of these runtimes, OTEL is the lingua franca that makes the multi-runtime story tractable.

5. Billing Granularity Tells You Who the Buyer Is

The five offerings price in five distinguishable units, and each price unit reveals the intended customer:

Anthropic’s session-hour ($0.08, billed to the millisecond with idle time free) targets developers and product teams who think about agent activity in terms of conversations or tasks. The session-hour is a more natural billing unit than vCPU-hour for agentic UX.
AWS’s vCPU-hour + GB-hour ($0.0895 + $0.00945, with separate Memory at $0.25 per 1,000 events and Gateway at $0.005 per 1,000 invocations) targets the platform team that already thinks in EC2/Fargate terms. AgentCore reads as Lambda for agents, and the pricing reads accordingly — composable, predictable, but requiring effort to model.
Google’s vCPU-hour + GB-hour + events ($0.0864 + $0.0090 + $0.25 per 1,000 events) sits very close to AWS, slightly cheaper on compute, with the same composable shape.
Microsoft’s per-user-per-month ($15 standalone, $99 in M365 E7) targets the enterprise license manager, not the platform team. Per-seat pricing aligns with Microsoft’s existing M365 economics but raises a structural question about agent populations: a large enterprise will plausibly run roughly an order of magnitude more agents than humans, and “user” in $15/user/month means the human granted permission to use Agent 365’s governance, not the agents themselves.
Cloudflare’s active-vCPU-second ($0.00002, idle free) is the most agent-shaped pricing unit of the five — it directly matches the agent’s actual work pattern, where sessions wake briefly, do a tool call, and sleep. For agents that idle frequently, Cloudflare’s TCO can come in dramatically cheaper than session-hour or vCPU-hour pricing; for agents running tight loops, the pricing models converge.

The Finout cost analysis emphasizes that for most production workloads “tokens represent the dominant cost driver” — runtime charges are typically a fraction of the inference bill. That said, runtime charges are the line item most sensitive to architecture choices: a poorly-designed agent that idles in a sandbox burns Anthropic’s session-hour or AWS’s vCPU-hour but not Cloudflare’s active-CPU-second.

6. MCP Is the Common Tool Plane, but Policy Around It Differs

All five offerings support MCP, but the policy posture around MCP servers varies significantly:

AWS AgentCore Gateway functions as an MCP transformer — APIs and Lambda functions become MCP-compatible tools, and external MCP servers are connected through the gateway. The gateway uses semantic search to surface relevant tools rather than loading all tool definitions upfront, which is a practical fix for the token-bloat problem in large MCP catalogs.
Cloudflare’s enterprise MCP architecture centralizes governance through MCP server portals and Code Mode (which Cloudflare reports cuts MCP-tool-list token usage by up to 99.9% by collapsing tool interfaces into dynamic entry points). Shadow MCP detection by Cloudflare Gateway addresses the unauthorized-server problem directly.
Anthropic Managed Agents supports MCP for custom tools with OAuth tokens stored in a secure vault accessed via proxy. Native fit with MCP since Anthropic authored the protocol; less explicit governance tooling than Cloudflare’s portal architecture.
Google’s MCP support routes through Vertex AI’s Cloud API Registry (governance) and Agent Engine (execution). The seam between the two is the same one flagged in the registry scout — least polished of the five.
Microsoft Agent 365’s MCP story is “Work IQ” servers (Microsoft Learn) plus third-party MCP servers registered to Entra. The governance integration with Entra ID Governance and Defender is the deepest of the five, but the runtime that consumes those MCP servers lives in Copilot Studio or Foundry, not in Agent 365 proper.

The convergence on MCP is genuine — every vendor recognizes it as the tool-plane standard. The divergence is in governance maturity: Cloudflare and Microsoft are the two with the most explicit “MCP needs an enterprise control plane” architecture; AWS, Anthropic, and Google ship the protocol support without the governance opinionation, leaving teams to layer their own policy.

Practical Implications

A Decision Framework by Organizational Profile

Single-cloud-committed enterprise (AWS, Azure, GCP): Default to your hyperscaler’s runtime. AWS AgentCore is the most architecturally complete; Gemini Enterprise Agent Platform is the most opinionated about the unified-surface pattern — SiliconANGLE’s coverage frames it as Google consolidating the full lifecycle (building, scaling, governing, optimizing) into a single platform; Microsoft Agent 365 alone among the three doesn’t actually run agents and you’ll need Copilot Studio or Foundry underneath. For all three, the gravitational pull is real and the marginal cost of fighting it for “neutrality” is rarely worth it for a single-cloud team.

Model-loyal team (Claude-first, OpenAI-first, Gemini-first): If the model choice is locked in, pick the runtime that matches. Claude → Anthropic Managed Agents (managed runtime stays in step with model updates and Claude-native tools like code execution come bundled). OpenAI/GPT-5.5 → there’s no first-party managed runtime in this scope yet; default to Cloudflare or AWS. Gemini → Gemini Enterprise Agent Platform.

Multi-cloud enterprise running agents in more than one place: Cloudflare Sandboxes plus the enterprise MCP reference architecture is now the credible neutral-middle-tier answer — the same shape as AGNTCY’s role for registries. The pairing is: Cloudflare Sandboxes for execution, Cloudflare AI Gateway for inference fan-out across providers, OTEL pipeline for observability, and your existing IdP for identity. Expect to invest in operations work the hyperscaler offerings absorb for you, and keep the hyperscaler runtimes as per-cloud options for workloads where their identity coupling is a feature rather than a tax.

Identity-as-product enterprise (Microsoft-shop): Agent 365 becomes the floor whether you want it to or not, because Entra is already the floor. The question is what runs underneath Agent 365 — Copilot Studio for low-code lifecycle, Foundry for code-authored agents. Negotiate the licensing carefully: $15/user/month makes sense for human-supervised agents but the autonomous-agent licensing remains undefined per SAMexpert, and a large autonomous-agent deployment could face material cost reframing at GA.

Pre-production / small scale / fewer than 10 agents: Skip the managed runtime entirely. Local sandboxes (Daytona, E2B), direct API calls, and a thin harness handle this scale at lower TCO than any of the five. The managed-runtime value lives in the operations work — observability, sandbox provisioning, credential brokering — that small-scale deployments can absorb manually.

The Decision Axes That Actually Matter

After normalizing the marketing, four axes survive:

Identity floor gravity — same axis as registries. Match your IdP.
Model coupling tolerance — Anthropic Managed Agents bets you’ll stay on Claude. Cloudflare and the hyperscalers don’t. If your model strategy includes “we’ll swap models as the frontier moves,” the model-coupled offering is a structural mismatch.
Sandbox-threat-model commitment — if your agents have any adversarial-input exposure (untrusted content in long-horizon loops, code execution on customer-supplied snippets, cross-tenant data access), Firecracker microVM (AWS) or gVisor (Google) is the right floor; Cloudflare’s container-plus-isolate model is appropriate for non-adversarial workloads. Anthropic’s isolation specifics should be requested explicitly.
Billing-unit fit — match the price unit to your agent’s actual behavior. Idle-heavy → Cloudflare. Tight inference loops → any. Per-seat governance → Microsoft.

What Pairs With What

Following the registry scout’s pairing logic, runtime is the slot below registry in the same middle tier. The pairings that look sane in April 2026:

AWS AgentCore Runtime + AWS Agent Registry + AgentCore Gateway + CloudWatch GenAI Observability — the AWS-first stack. Tightest integration, deepest IAM tax.
Gemini Enterprise Agent Engine + Gemini Enterprise Agent Gallery + Vertex AI Cloud API Registry + Google Cloud Observability — the GCP-first stack. Best regulated-industry data-residency story (CMEK + VPC-SC + Model Armor).
Microsoft Agent 365 + Copilot Studio or Foundry + Microsoft mcp-gateway + Defender/Purview/Azure Monitor — the Entra-committed stack. Strongest governance integration, deepest licensing complexity.
Anthropic Managed Agents + Anthropic-hosted MCP + Cowork OTEL pipeline — the Claude-loyal stack. Simplest setup, model-coupled.
Cloudflare Sandboxes + Cloudflare AI Gateway + Cloudflare Access + AGNTCY (registry) + OTEL pipeline + your-choice IdP — the neutral stack. Most operational work, most flexibility, cheapest at idle.

Open Questions

Does Anthropic’s brain/hands decoupling framing survive Mythos containment requirements? The Mythos sandbox-escape disclosure happened in the same news cycle as Managed Agents going beta. If the containment-architecture paper is the new floor for sandbox requirements, Anthropic will need to publish more isolation detail than it has so far, or Cloudflare and AWS gain ground on the credible-runtime-for-adversarial-workloads axis.
What does Microsoft’s autonomous-agent licensing look like at GA? Per SAMexpert, the May 1 GA covers delegated-permission agents only. If Microsoft prices autonomous agents per-agent-month at the same $15 unit, a customer running 1,000 autonomous agents pays $180,000/year for governance alone — before any execution costs. If it’s a separate, lower unit, the math changes. The answer reshapes the cost model for Microsoft-shop teams.
Will Anthropic’s session-hour pricing prove durable? $0.08/session-hour is cheap enough that it’s hard to imagine it being the deciding factor — but it’s also cheap enough that it could be an introductory price. The Finout analysis flags that all current numbers are beta-era. Watch for a Q3 2026 repricing.
Does Cloudflare’s BYOM story actually compete on inference quality? Cloudflare runs Workers AI for inference, with 70+ models across 12+ providers routable through AI Gateway. For frontier-quality coding agents, “any model” still means Anthropic, OpenAI, or Google in practice — and routing those inferences through Cloudflare adds latency for non-trivial requests. Whether the routing tax is acceptable for production workloads is an open data point.
How do the runtime offerings interact with each vendor’s coding-agent harness story? Anthropic Managed Agents, AWS AgentCore Runtime, and Cloudflare Sandboxes all overlap with what teams currently solve with Claude Code, Codex CLI, or local harnesses. The question for any team running coding agents at scale is whether the managed-runtime substrate replaces or complements the coding-harness layer. Coverage on the substitution math has been thin so far.
Will OpenAI ship a comparable managed-runtime offering? Of the four large model vendors, OpenAI is the conspicuous absentee from the runtime landscape. With the GPT-5.5 unification of Codex into the main model line, the architectural premise (model surface = agent surface) is in place; the operational layer isn’t. Expect this gap to close inside 2026.

Sources

Scaling Managed Agents — Anthropic Engineering
Anthropic Introduces Managed Agents to Simplify AI Agent Deployment — InfoQ
Anthropic Just Launched Managed Agents. Let’s Talk About How We’re Going to Pay for This — Finout
Anthropic Managed Agents: A Hosted Runtime for Claude + MCP — MindStudio
Monitor Claude Cowork activity with OpenTelemetry — Anthropic Support
Anthropic Monitoring & Observability with OpenTelemetry — SigNoz Docs
Cloudflare Sandboxes Reach General Availability — InfoQ
Agents have their own computers with Sandboxes GA — Cloudflare Blog
Cloudflare Outlines MCP Architecture as Enterprises Confront Security and Governance Risks — InfoQ
Scaling MCP adoption: Our reference architecture for simpler, safer and cheaper enterprise deployments of MCP — Cloudflare Blog
Cloudflare’s AI Platform: an inference layer designed for agents — Cloudflare Blog
Cloudflare Launches Code Mode MCP Server to Optimize Token Usage for AI Agents — InfoQ
Amazon Bedrock AgentCore Pricing — AWS
Amazon Bedrock AgentCore FAQs — AWS
How AgentCore Tools session isolation works — AWS Docs
Observe your agent applications on Amazon Bedrock AgentCore Observability — AWS Docs
Microsoft Agent 365: The Control Plane for Agents — Microsoft
Agent 365 Licensing: What It Covers and Costs — SAMexpert
Billing rates and management - Microsoft Copilot Studio — Microsoft Learn
Work IQ MCP overview (preview) — Microsoft Learn
Introducing Gemini Enterprise Agent Platform — Google Cloud Blog
Gemini Enterprise Agent Platform pricing — Google Cloud
With Gemini Enterprise Agent Platform, Google brings agentic development and control under one roof — SiliconANGLE
When the Agent Is the Adversary: Architectural Requirements for Agentic AI Containment After the April 2026 Frontier Model Escape — arXiv
GPT 5.5 and OpenAI Codex Superapp — Latent Space
Agent Registries: AWS vs. Microsoft vs. Google vs. AGNTCY — Grimoire (prior scout)