Scout: Agent-Economics Post-Meter — What Programmatic Claude Actually Costs After June 15

Summary

On 2026-05-14, Anthropic announced that beginning 2026-06-15, programmatic Claude usage — the Agent SDK, the claude -p headless command, Claude Code GitHub Actions, and third-party tools including OpenClaw — moves off the subscription rate-limit pool onto a dedicated monthly credit pool billed at API list prices. Pro gets $20, Max 5x gets $100, Max 20x gets $200, with Team and Enterprise seat-based variants reported in similar tier ranges; unused credits do not roll over (The Register; InfoWorld). At nominal Sonnet 4.6 input-token rates, $200 corresponds to roughly 67 million input tokens before accounting for output tokens, cache writes, extended-thinking overhead, retries, and tool-call replay — the real planning number is materially lower, and practitioner write-ups converge on a single autonomous Opus-heavy agent loop being able to drain a Max 20x credit in days, not weeks (dikrana.dev). The cap binds first on parallel-subagent reviewer harnesses and headless Opus loops; modest claude -p cron jobs survive on Pro. The API optimisation playbook — prompt caching at a 10% read rate, batch processing at a 50% discount, model routing through gateways like LiteLLM or Bifrost — moves from nice-to-have to mandatory at scale. The same afternoon, Sam Altman offered new enterprise customers two months of free Codex via a one-click migration tool within a 30-day window (Axios; OpenAI) — a switching offer whose appeal depends entirely on how much CLAUDE.md, Skills directory, and MCP-server investment a team is prepared to rewrite into Codex’s parallel-but-not-portable AGENTS.md, skill-directory, and hook conventions (Blake Crosley).

Key Findings

1. The meter, exactly: what’s on the new pool and what isn’t

Anthropic’s verbatim framing of the split is the cleanest summary available: “We’ve heard your questions about SDK and claude -p usage sharing your subscription rate limits with Claude Code and chat. Starting June 15, programmatic usage gets its own dedicated budget instead. Your subscription limits don’t change, they’re now reserved for interactive use.” (The Register’s quotation of Anthropic’s social-media post).

The Register’s coverage confirms only the Pro tier amount directly; the full tier ladder ($20 Pro / $100 Max 5x / $200 Max 20x) appears in InfoWorld’s reporting and is corroborated by The New Stack, the-decoder, and Tygart Media’s roundup, which also reports per-seat amounts for Team Standard ($20/seat) and Team Premium ($100/seat). Several outlets report Enterprise Standard seats receive no credit allocation by default, with the framing — per Tygart Media’s coverage — that teams running shared production automation are pointed toward the Claude Developer Platform with an API key for predictable pay-as-you-go billing rather than toward the metered credit pool (Tygart Media).

What’s on the meter:

The Claude Agent SDK (Python and TypeScript)
The claude -p headless command
Claude Code GitHub Actions
Third-party apps built on the Agent SDK, including OpenClaw and ACP-protocol integrations like Zed (Zed Blog)

What stays on subscription limits: interactive Claude Code in terminal and IDE, Claude Cowork, and the chat web app. Routines — Anthropic’s scheduled / API-triggered / GitHub-event-triggered Claude Code automations — “draw down subscription usage the same way interactive sessions do” and add a daily run cap per account (Claude Code Routines docs). This is a meaningful structural exception: a routine triggered from a GitHub webhook to run a PR review burns subscription tokens, while the same workflow executed via the claude -p headless command in a self-hosted CI runner burns programmatic credits. The trigger mechanism is the line, not the workload shape.

The non-rolling policy is explicit in the reporting: “It’s worth noting that, if the programmatic credit is not exhausted, it doesn’t roll over. It gets lost, or you might say, Anthropic reclaims it.” (The Register). Credits reset every billing cycle (the-decoder). Overage handling reportedly works two ways: enable extra usage and continue at pay-as-you-go API rates, or leave it off and have SDK calls fail until next cycle (dev.to/vainamoinen).

2. Where the cap actually binds: realistic workload economics

The headline tier amounts are misleading without the per-token math beneath them. Anthropic’s public API list rates (Anthropic pricing docs) are the conversion factor:

Claude Opus 4.7: $5 input / $25 output per million tokens
Claude Sonnet 4.6: $3 input / $15 output per million tokens
Claude Haiku 4.5: $1 input / $5 output per million tokens

At Sonnet 4.6 input rates, a Max 20x $200 credit nominally covers around 67M input tokens or roughly 13M Opus 4.7 input tokens per month — but this is the raw-input headline only. Real agentic workloads also pay for output tokens, cache writes, extended-thinking overhead, retries, and tool-call replay, so the practitioner-relevant number is lower in every case, and how much lower is the variable teams need to measure for themselves (dikrana.dev). Practitioner write-ups converge on three workload classes with dramatically different burn profiles:

Indie / single-developer headless workloads. A “PR review per push, four pushes a day” routine running on Sonnet 4.6 — roughly 50K input plus 4K output tokens per review, around $0.21 per review — burns approximately $25 per month across 120 reviews, exhausting the $20 Pro credit by week three (Build This Now). Modest claude -p cron jobs (nightly tidy-up, weekly docs-drift checks) reportedly land in the $0–$10/month range on the new credit pool (dikrana.dev).

CI-integrated Agent SDK pipelines on the median engineering team. GitHub Agentic Workflows running PR review and auto-triage at typical rates (around 6.8 runs per day for an auto-triage workflow, with optimisation passes yielding around 62% token savings) reportedly land near $200/month — enough to consume the full Pro credit on a single workflow before any other automation runs (GitHub Blog on token efficiency in Agentic Workflows).

Parallel-subagent reviewer harnesses on Opus. The harness pattern that ate the most subscription cross-subsidy under the old model — six-agent parallel review, autonomous Opus-heavy loops with extended thinking, scheduled background routines invoked via API trigger — is where the cap binds first. An autonomous Opus 4.7 loop with extended thinking on can reportedly drain a Max 20x $200 credit in 36 to 72 hours of unattended execution; parallel subagent workloads absorb roughly 30 to 40 reviews per month before the pool depletes (dikrana.dev). Theo Browne’s reaction, quoted via The New Stack and tracked through downstream coverage: “I now have to make the Claude Code experience on T3 Code significantly worse” (Build This Now’s quote attribution).

Zed’s own assessment of the impact on its ACP-protocol Claude integration is the bluntest published number from a vendor with skin in the game: “a major cost increase” for heavy agent users, with the pre-change subsidy estimated at roughly 15–30× compared to API pricing (Zed). Community estimates of effective per-workload price increases range widely depending on workload shape — anywhere from low-double-digit to triple-digit multiples — and the specific numbers circulating in the days after announcement are not independently substantiated outside community write-ups, so they should be treated as load-bearing for direction but not for precise planning math.

The available evidence — Zed’s vendor-side estimate of a 15–30× pre-change subsidy, the consistent framing across ecosystem reporting, and Anthropic’s own positioning of the split as separating programmatic from interactive — strongly suggests that flat-rate Claude subscriptions were carrying API-equivalent programmatic workloads at substantially subsidised rates, particularly for heavy agentic patterns. Anthropic has not published its own subsidy ratios or workload economics, and Zed has a commercial interest in the framing; the direction is well-supported, the precise multiple isn’t (Implicator’s coverage; Zed’s own framing).

3. The optimisation playbook moves from optional to operationally critical

For workloads that stay on subscription limits — interactive Claude Code, Claude Cowork, Routines — the optimisation playbook is still a nice-to-have. For anything on the programmatic meter, three primitives are now economically decisive.

Prompt caching at a 10% read rate. Anthropic’s own pricing docs document the math: cache reads cost 0.1× base input price, 5-minute cache writes cost 1.25× base input, 1-hour writes cost 2× (Anthropic pricing). Break-even is one hit on the 5-minute cache, two on the 1-hour. For agentic harnesses that re-feed the same system prompt, tool definitions, and CLAUDE.md / AGENTS.md on every turn, this is the largest single lever. As one practitioner write-up puts it, well-tuned cache-aware prompting can flip “a 60-percent-rate workload into a 5-percent-rate one” (dikrana.dev).

Batch API at 50% off. The Batch API is 50% off base rates and stacks with prompt caching, in principle compounding to ~95% reductions for the workloads where it applies. The catch is that batch is asynchronous (results within 24 hours, often within one), which excludes most interactive agentic work but is well-suited to overnight log-triage routines, mass-PR-analysis sweeps, or scheduled cleanup jobs — precisely the workload class the Routines product is positioned for, though Routines themselves still bill against the subscription pool rather than the programmatic meter.

Model routing through gateways. LiteLLM and Bifrost are the two most-discussed gateway primitives (Maxim AI’s comparison); Bifrost is reportedly several orders of magnitude faster at high RPS, while LiteLLM has the broader ecosystem with around 33,000 GitHub stars. Both let teams route easy completions to Haiku ($1 input / $5 output per million) and reserve Opus 4.7 ($5 / $25) for genuine reasoning. Under the new meter, this is no longer a nice-to-have; it is the difference between a credit pool that lasts the month and one that fails on day 20.

The compositional point: caching is a multiplier on what you already send, batch is a discount on what doesn’t need to be live, routing is a redirect to a cheaper model when the task allows. Stack all three and the effective per-workload rate drops meaningfully — though the variance findings from prior coverage of token economics (single-task spread of up to 30× across runs of the same problem on the same model) imply teams should expect heavy-tailed cost distributions and budget accordingly.

4. The Codex switching window: a 30-day offer against a multi-quarter migration

OpenAI’s response landed the same afternoon. Axios’s reporting summarises it as a Sam Altman X post offering new business customers two months of free Codex usage; OpenAI’s enterprise promo form operationalises it as a 30-day signup window with a one-click migration tool. Less than an hour later, Anthropic posted a counter-move: a 50% bump in Claude Code weekly subscription limits through July 13 (Yahoo Finance / Axios syndication).

The Codex offer is real, but the switching-cost math is the practitioner-relevant question, and it isn’t simple.

Codex’s pricing structure (OpenAI Developers — Codex pricing) is tier-comparable to Anthropic: Plus at $20/month, Pro at $100/month (with a 2× usage bump through May 31, 2026), 5-hour rolling windows for message limits, and pay-as-you-go credit consumption for Business and Enterprise plans. The headline parity with Claude’s Pro / Max ladder is intentional. The friction is in the harness — the configuration files, hooks, skills directory layout, and MCP integration that a team running Claude Code at scale has accumulated.

Practitioner migration guides converge on a similar inventory of what ports and what doesn’t (Blake Crosley; Code on Grass):

Configuration files: CLAUDE.md becomes AGENTS.md. Both formats exist; Codex reads AGENTS.md hierarchically from ~/.codex/AGENTS.md, then repository-level, with nearer files overriding. Claude Code is single-file by default. The line items port; the prose philosophy and human documentation don’t carry over cleanly without rewriting as executable rules.
Skills: flat files become directories. Claude Code’s .claude/commands/ slash commands are flat markdown; Codex skills are directories with SKILL.md frontmatter plus scripts, references, and assets alongside. Migration is per-skill and involves filesystem restructuring, not just renaming.
Hooks: similar event names, different coverage. PreToolUse, PostToolUse, SessionStart, Stop exist in both, but Codex’s interception reportedly “does not cover every shell path and does not intercept web search or other non-shell, non-MCP tool calls” (Blake Crosley). Sequential hook dispatching from Claude Code becomes concurrent matching in Codex, breaking any workflow that depended on ordering guarantees.
MCP: ports directly, in principle. Both platforms support MCP. Codex uses codex mcp add and [mcp_servers.<name>] tables in config.toml; existing MCP configurations migrate without protocol-level rewrite.
Conversation history: doesn’t transfer. Long-running multi-day threads in Claude Code are session-bound and don’t migrate. Codex’s design centres state in the repo (docs, plans, AGENTS.md) rather than in session transcripts — a different memory model, not a portable one (Code on Grass).

The switching cost is not the model-quality question (both are competitive on benchmarks at this point); it’s the harness-rewriting question. For a team that’s invested in a single CLAUDE.md and a handful of slash commands, two months free of Codex is a reasonable test window. For a team with global+repo AGENTS.md hierarchies, dozens of skill directories, custom dispatcher-ordered hook lattices, and tuned MCP server bindings, the two-month free window is the right trial but it is not enough runway to complete the migration; the rewrite is a quarter of work, not a fortnight.

5. Build-vs-buy implications: self-hosted inference and routed API patterns

The post-meter pricing changes the threshold at which self-hosted inference and gateway-routed API patterns start to win on TCO. Two thresholds worth carrying into a planning meeting.

Self-hosted breakeven moves down. Pre-meter, the rough planning number for self-hosted breakeven against Anthropic API pricing was on the order of 5–10 million tokens per month for premium-API workloads and several hundred million tokens per month for budget-API alternatives (DevTk.AI’s TCO analysis). With self-hosting reportedly costing 3–5× the raw GPU price after operational overhead, and idle-time penalties of 60–70% for non-24/7 workloads, DevTk.AI’s modelling puts the breakeven for code-agent workloads on open-weight models like Llama 4 Maverick or DeepSeek V4 near 600M tokens per month for code and around 1.2B tokens per month for chat (DevTk.AI) — but these are illustrative planning estimates, not generalised thresholds, and they shift materially with GPU generation, utilisation rate, quantisation, batch size, context length, latency targets, and the engineering overhead a team is realistically prepared to absorb. For a team running scheduled Routines and CI-integrated agent pipelines under the new meter, the math compresses but does not collapse: the credit pool buys roughly 67M Sonnet input tokens on Max 20x; a self-hosted Llama 4 cluster only beats that on aggregate cost above the hundreds-of-millions-of-tokens-per-month threshold, with the exact crossover sensitive to all of the above. The break-even isn’t where the headline-credit-pool number suggests.

Routed-API patterns get sharper. What the meter does change is the relative attractiveness of routing programmatic workloads through a gateway (LiteLLM, Bifrost) onto direct API keys at pay-as-you-go list rates, with per-developer spend visibility built in. The credit pool is effectively a forced trial of API-rate economics on a fixed monthly cap; teams that want the same economics with predictable visibility, spend-by-key tracking, and the ability to apply prompt-caching and batch discounts deterministically can move to direct API billing through a gateway and skip the subscription tier entirely. The framing surfaced in secondary reporting — that teams running shared production automation are directed toward the Claude Developer Platform with an API key for predictable pay-as-you-go billing — is, taken at face value, a pointer to exactly this pattern (Tygart Media).

The build-vs-buy framing for 2026 H2 is therefore less about whether to self-host and more about whether to subscription-meter or direct-API-meter. For teams whose programmatic workload regularly exceeds the credit-pool cap, the answer is straightforwardly: direct API with caching and routing in place. For teams below the cap, the credit pool is fine — but it’s a forced introduction to the API-rate cost surface, and the discipline it teaches (per-workload visibility, prompt-cache audit, model-routing rules) is the same discipline a direct-API deployment would have needed anyway.

Practical Implications

Pre-June-15 checklist for any team running programmatic Claude

The thirty-day window between announcement and effective date is a planning event. Concrete actions worth completing before the cap binds:

Audit what’s programmatic vs interactive. Anthropic’s line is sharp: Agent SDK, claude -p, Claude Code GitHub Actions, and third-party SDKs go to the meter; interactive Claude Code in terminal/IDE, Cowork, and Routines stay on subscription. If a workflow can be re-architected from claude -p cron to a scheduled Routine, it stays on the cheaper subscription pool. (One-off Routine runs do not count against the daily Routine cap, per Anthropic’s docs.)
Run a dry-run on actual token consumption. Claude Code GitHub Actions writes a token-usage.jsonl artifact per workflow with input, output, cache-read, and cache-write counts (GitHub Blog); the Agent SDK exposes equivalent per-call usage data. Aggregate this across a representative week before the new meter starts. Multiply by API list rates. That number is the May baseline; June will be at least that, and probably higher once optimisation gains have to be re-extracted.
Turn on prompt caching for every stable system prompt. 5-minute caches on interactive sessions, 1-hour caches on batch and scheduled workloads. The economics are documented and unambiguous; the gap between “knowing prompt caching exists” and “having it consistently configured across every Agent SDK call” is where teams will discover their actual exposure on June 15.
Put model routing in place via a gateway. Either LiteLLM (broadest ecosystem, Python-native, prototype-friendly) or Bifrost (faster at production load, Go-native). Route easy completions to Haiku 4.5 ($1 input / $5 output per million), reserve Opus 4.7 ($5 / $25) for genuine reasoning. Cursor’s Auto-mode is the same idea implemented inside a single editor — gateway-routed API is the same idea implemented at the team level.
Decide on the overage stance. With extra usage on, programmatic work continues at pay-as-you-go API rates past the credit pool. With extra usage off, calls fail until next cycle. The choice is between a soft cap that creates billing surprises and a hard cap that creates outage surprises. Either is defensible; pretending the choice doesn’t exist is not.

The Codex two-month window: when it’s worth using and when it isn’t

The decision tree on the Codex migration offer is narrower than the marketing suggests.

Use the trial when: the team has minimal CLAUDE.md / Skills / hook investment, primarily runs claude -p or Agent SDK calls in CI, and the workflow is portable to AGENTS.md and Codex skills with rewrite-not-migration effort. Two months is a reasonable evaluation horizon for this profile.

Treat the trial as a hedge when: the team has substantial harness investment (multi-file AGENTS.md hierarchy equivalents, ordered hook dispatchers, dozens of slash commands or skill directories, custom MCP server bindings) but has not committed to a single vendor for 2026 H2. Use the free window to validate that Codex’s behaviour on representative tasks holds up; do not use it as the trigger to migrate full production workflows.

Skip the trial when: the team is committed enough to Claude that the migration cost outweighs two months of free Codex. The CLAUDE.md/Skills/MCP investment is the moat — Anthropic priced the meter at a level it believes that moat survives, and for many teams that calculation will be right.

Early ecosystem reaction suggests the third bucket is the largest in practice — mature Claude Code teams with deep harness investment tend toward absorbing the new costs over immediate migration — but no published telemetry confirms the distribution, and the loud community reaction is unrepresentative of teams that have quietly built deep harness customisation. For those teams, the meter is a budget event, not a migration event.

When the cap binds: realistic triggers

For practitioner planning, the cap-binding patterns to watch:

Multi-agent or parallel-subagent workloads on Opus. These hit the credit ceiling first. Budget assuming a Max 20x credit lasts roughly a week of heavy Opus parallel work.
Headless claude -p loops with extended thinking enabled. Practitioner estimates suggest a single autonomous loop could plausibly exhaust Max 20x within days of sustained unattended execution — the dikrana.dev 36–72 hour figure is a directional estimate from a community write-up, not a benchmarked range, and actual depletion depends heavily on workload shape and thinking-token cap. Cap MAX_THINKING_TOKENS aggressively or move thinking-heavy tasks to interactive sessions that stay on subscription.
GitHub Actions auto-triage or PR-review on busy repos. Several runs per day per workflow compounds. The GitHub-blog token-efficiency optimisations (around 62% savings on representative workloads) are the difference between “fits in Pro credit” and “requires Max 5x or overage” for a single auto-triage workflow at moderate volume.
Third-party tool fleets. OpenClaw, Zed agent integrations via ACP, Cline, and similar third-party harnesses now all draw from the same per-account credit pool. Teams running multiple third-party tools against the same Claude account share one budget across them — a structural constraint worth modelling explicitly before June 15.

Open Questions

How does the daily Routines run cap interact with the credit pool over time? Routines remain on subscription limits as of the research preview (Claude Code Routines docs), but reporting on whether Anthropic will eventually move scheduled and event-triggered Routine runs onto the programmatic meter remains thin. The current trigger-mechanism boundary (Routine = subscription, claude -p = meter) is unstable as a long-term position, especially as the Routines product matures from research preview.
What does an enterprise contract priced against the new meter actually look like? Public reporting covers the prosumer tiers; enterprise pricing under the new meter steers teams toward the Claude Developer Platform with direct API-key billing rather than the metered credit pool. How that interacts with negotiated enterprise contracts and existing seat-based agreements is not addressed in publicly available reporting, and teams with enterprise terms should expect this to be the first material conversation with their Anthropic account team.
Does OpenAI’s two-month offer succeed in moving meaningful enterprise workloads? The 30-day signup window means the answer is observable by mid-June. If Codex captures meaningful migrations, expect a structural shift in vendor power; if Anthropic’s CLAUDE.md / Skills moat holds and Codex’s free trial fails to convert, expect Anthropic to push the meter harder on the next iteration. The Anthropic 50%-Claude-Code-weekly-bump-through-July-13 counter-move is a tell that the company expected the offer to bite at least partially.
Where does the prompt-caching effective-discount math actually land in practice? The documented mechanics (cache reads at 0.1× input rate, 5-min writes at 1.25×) suggest near-90% reductions on cache-heavy workloads, but realised savings depend on workload structure — long stable system prompts cache well, dynamic per-turn context does not. Practitioner write-ups citing 60% to 5% conversions are plausible but not portable; teams should measure their own cache hit rates in the dry-run, not budget against the headline percentages.
Will third-party gateway tooling (LiteLLM, Bifrost) add Anthropic-credit-pool-aware billing primitives? Currently both gateways route to direct API keys. As more teams shift programmatic workloads off the subscription credit pool to direct API for predictable economics, expect gateway vendors to add credit-pool-aware tracking. None of the gateway reviews available at this writing surface this as an existing feature.
Does the meter accelerate the move to self-hosted open-weight inference for high-volume agentic teams? The breakeven math hasn’t moved enough on its own — DevTk.AI’s planning-estimate threshold still sits around 600M+ tokens/month for code workloads, with the actual crossover sensitive to GPU generation, utilisation, and engineering overhead (DevTk.AI’s analysis). What may matter more is that the operational discipline a team has to build to live within the credit pool (per-workload spend visibility, cache-and-route hygiene, batch routing for non-interactive work) is the same discipline required to operate a self-hosted cluster. Reporting on whether 2026 H2 sees a step-change in self-hosted adoption among heavy agentic teams will be the data point worth tracking through autumn.