Scout: Post-IDE Agent Orchestration — What Primitives Are Converging

Summary

In the first two weeks of April 2026, four shipments crystallized what “post-IDE” means as a product category: Cursor 3’s Agents Window with worktree-backed parallelism and local-to-cloud handoff [1][2][3]; Claude Code’s managed multi-agent PR review dispatching 5–9 specialized subagents per pull request [4][5][6]; Gemini CLI’s markdown-defined subagents with isolated tool and MCP scope [7][8][9]; and Stage, a human-first review surface that re-segments AI-generated diffs into narrative “chapters” [10][11]. Read together, four primitives have clearly converged across 2+ vendors and are safe to build on: markdown-defined subagent definitions with frontmatter metadata, git worktrees as the unit of parallel-agent isolation, per-subagent tool/MCP scoping, and review-as-a-first-class surface separate from editing. Three primitives remain vendor-specific and carry lock-in risk: local-to-cloud session handoff semantics, cost accounting and budget controls, and plugin/marketplace delivery. The practitioner map: write your domain logic against the portable primitives (subagent markdown, AGENTS.md, git-worktree isolation, MCP tool surface) and treat the orchestration UI, cloud sandboxes, and marketplaces as swappable vendor shells.

Key Findings

1. The Agents Window is now the default UX primitive, not the editor

Cursor 3 shipped April 2 with an agent-first interface the co-founders frame as “a unified workspace for building software with agents” [1]. The structural change: agents — not files — are the top-level entity. All local and cloud agents appear in a single sidebar; runs can be spawned from mobile, web, Slack, GitHub, or Linear and all converge in the same pane [1][2]. The IDE view is still one click away but has been demoted from “home” to “one of several surfaces.”

This matches what The New Stack identified in its “IDEcline” piece as the broader pattern: across Claude Code, OpenAI Codex, Imbue’s Sculptor, and now Cursor 3, the IDE has been “demoted from the orchestration layer” and “repositioned as a verification and review surface rather than where primary work occurs” [12]. The control plane sits above the editor, not inside it. The primitive that replaced the editor as the home screen is the agent list — parallel, named, cross-repo, each with its own status and artifacts.

Addy Osmani’s taxonomy [13] formalizes the shift: the conductor model (one agent, synchronous, context-limited) is being replaced by the orchestrator model (multiple specialized agents, asynchronous, coordinated). Cursor 3’s sidebar is the orchestrator made visible.

Practitioner implication: if your internal agent tooling still assumes “one editor with a chat panel,” you’re building against a model that three of the four major vendors just stopped shipping. Build instead against an agent-list primitive with per-agent status, artifacts, and spawn-point metadata.

2. Git worktrees won the parallel-isolation layer

Cursor 3’s worktree integration makes /worktree and /best-of-n first-class commands, spawning isolated git worktrees per agent with /apply-worktree as the merge-back path [3][14]. Claude Code’s subagent docs and Osmani’s taxonomy both identify git-worktree isolation as the stable pattern for parallel agent execution [13][14]. The Cursor 3 community consensus — including critics — agrees the worktree primitive isn’t novel: Liran Baba’s analysis notes Cursor 2 already exposed worktree.json configuration and Claude Code supports the same pattern at flat monthly cost, making worktree-based parallelism effectively vendor-agnostic infrastructure [14].

Why this converged: worktrees give you filesystem isolation without process isolation, branch-merge semantics for free, and compatibility with any git-aware CI system. No vendor owns the primitive. The HN critique on Cursor 3 identifies a remaining cross-vendor gap — “Agent A doesn’t know what Agent B is doing” when both are working from the same base commit [14] — but this is a category-wide problem, not a differentiator.

Practitioner implication: build your parallel-agent orchestration on git worktree add, not on any vendor’s session-container abstraction. Worktree + branch + PR is the portable bundle. Any vendor that tries to replace it with a proprietary session container is a lock-in signal.

3. Markdown-defined subagents with frontmatter are the portable spawn primitive

Gemini CLI’s April subagent launch defines subagents as Markdown files (.md) with YAML frontmatter in ~/.gemini/agents (personal) or .gemini/agents (project/team) [7][8][9]. Required frontmatter: name (slug) and description (used by the primary agent to decide when to delegate); optional fields cover tool allowlists with wildcard expansion (*, mcp_*, mcp_my-server_*), model selection, and inline MCP server definitions [8]. The body of the file becomes the subagent’s system prompt. Delegation is automatic (description-matched by the orchestrator) or explicit via @agent-name syntax in the user prompt [7].

Claude Code’s subagent format is structurally identical: markdown files in .claude/agents/ with frontmatter for name, description, tool allowlist, and model selection [15]. The PR-review skill (255,208 installs on the Anthropic-verified plugin) spawns nine specialized subagents — security, quality, test coverage, performance, dependency safety, etc. — each a markdown-defined unit with its own tool scope and confidence threshold [4][5][6][15].

The format convergence is not accidental. Both formats share the same seven-section conventional structure that AGENTS.md is standardizing on [16]. The Agentic AI Foundation (Linux Foundation, December 2025) now houses AGENTS.md alongside MCP with OpenAI, Anthropic, Google, AWS, Bloomberg, and Cloudflare as members [16]. Over 60,000 GitHub repositories already include an AGENTS.md. Gemini CLI reads GEMINI.md, Claude Code reads CLAUDE.md, Codex reads AGENTS.md — the filenames differ but the content format is converging, and cross-compatibility (Claude Code reading AGENTS.md natively) is the explicit near-term direction [16].

Practitioner implication: write subagent definitions as markdown + YAML frontmatter, not as vendor SDK calls. The same file should load into Claude Code, Gemini CLI, and Codex with at most a filename rename. Per-subagent tool allowlists and inline MCP server declarations are the portable permission surface. If your internal platform requires proprietary agent-definition formats, you’re building a migration tax into your own system.

4. Review is now a separate surface, not a file diff

Three of the four launches made review a first-class, distinct product — not a pane inside the editor:

Claude Code Review: dispatches 5–9 specialized agents per PR (security, quality/style, test coverage, performance, dependency safety, CLAUDE.md compliance, git history context, previous-comment review, comment verification), scores each finding on 0–100 confidence with default 80 threshold, ranks by severity, posts inline comments with full SHA and line ranges [4][5][6][15]. Average review time ~20 minutes. Anthropic’s internal adoption raised substantive-comment rate on PRs from 16% to 54%; 84% of 1000+ line PRs generated findings averaging 7.5 issues; engineers marked <1% of findings as incorrect [4][5].
Stage: explicitly not a bot. Stage re-segments PR diffs into sequential “chapters” (thematic clusters of related changes) with per-chapter focus items, positioning itself against CodeRabbit and Greptile by keeping humans as the decision layer [10][11]. The tagline from the Show HN: “Teams are moving faster than ever with AI, but more and more engineers are merging changes that they don’t really understand. The bottleneck isn’t writing code anymore, it’s reviewing it” [11]. Stage is the complement to Claude Code Review — where Anthropic automates the findings, Stage restructures the human reading path.
Cursor 3: cloud agents generate “demos and screenshots of their work for you to verify” before integration; a new diffs view lets you stage, commit, and manage PRs directly in the workspace [1][3].

The convergent pattern: review is a dedicated post-generation step with its own UI, not a diff panel next to code. The editor is where verification happens, not where review lives. This matches the IDEcline argument that IDEs are becoming verification surfaces, not orchestration ones [12].

Practitioner implication: assume review is a separable product in your internal stack. Build your PR pipeline so the review layer (automated findings + human narrative) can be swapped between Claude Code Review, Stage, CodeRabbit, Greptile, or an internal system without rewriting the generation pipeline. The Claude Code Review primitive — parallel specialized subagents with confidence thresholds and severity ranking — is the portable pattern; implement it yourself on top of the subagent markdown format if you want vendor-neutrality.

5. Local-to-cloud handoff is the #1 vendor-specific lock-in surface

Cursor 3 markets bidirectional local↔cloud handoff as a headline feature: a session moves between environments preserving state, cloud runs on Cursor-managed Ubuntu VMs with Composer 2, local uses your machine [1][3]. VS Code’s copilot/agents docs describe similar handoff semantics with session-type dropdowns and context-carrying between local and cloud sessions [17]. But the semantics are each vendor’s own: Cursor’s cloud session state, Microsoft’s cloud agent sessions, and Gemini CLI’s remote-agents mode [7] are not interoperable. No standard exists for “here is a paused agent session with filesystem, chat history, and tool-call log — resume it anywhere.”

This is where the lock-in is accumulating. Once an engineer’s active work sits in a vendor-specific cloud sandbox with pending artifacts, switching vendors means either finishing or discarding that work. Combined with the cost dynamics below, this is the highest-friction exit.

Practitioner implication: treat vendor cloud sandboxes as ephemeral compute, not as session homes. Require that every session’s durable state live in git (branch + worktree + AGENTS.md notes) so that any cloud sandbox is reconstructible from source control. Do not build internal tooling that relies on a specific vendor’s session-resume semantics.

6. Cost visibility is the second lock-in surface — and it’s regressing

The HN reaction to Cursor 3’s cloud agents was dominated by cost surprise: users reported $2,000+ in two days, “$2k/week with premium models” before switching to Claude Code Max at “1/10th the price,” and monthly spend ranges of $200–$1,800+ without advance visibility [2][14]. The pricing page does not disclose cloud-agent resource costs; per-token billing means users don’t know the bill until it arrives [14]. Claude Code Review is estimated at “$15–25 per pull request” with no public fixed pricing [4].

Gemini CLI gets nominally cheaper per-token pricing on Gemini 2.5 Pro with 1M-token context, but cost visibility is equally opaque at the subagent-fleet level. There is no cross-vendor standard for real-time budget enforcement, per-subagent cost attribution, or run-ahead cost estimation. Claude Code’s permission model — asking approval per tool call, with an “auto mode” classifier for risky actions [18] — is about safety, not cost; no equivalent exists for spend control.

Practitioner implication: build your own cost accounting layer. Instrument every subagent spawn with a budget cap and a hard-stop; log token spend per worktree; expose per-engineer and per-repo cost dashboards. Do not rely on any vendor’s built-in cost visibility — it is consistently the weakest surface across all four launches.

7. Plugin/marketplace delivery is third lock-in surface — but tool scope is portable

Cursor 3 shipped a plugin marketplace bundling Skills, Subagents, MCP servers, Hooks, and Rules with one-click install plus private team marketplaces [1][3]. Claude Code has the plugin directory with Anthropic-Verified marks (Code Review plugin: 255,208 installs) [15]. Gemini CLI ships extensions directories [7]. Each is proprietary.

But the underlying tool surface is portable: MCP is the cross-vendor standard, subagent markdown is converging, and AGENTS.md is under the Linux Foundation [16]. The distribution packaging is vendor-specific; the distributed artifact is not.

Practitioner implication: publish your internal skills and subagents as plain markdown + MCP server definitions in a git repo. Distribute via git clone into .claude/agents/, .gemini/agents/, .cursor/plugins/, etc. Skip the marketplaces for internal tooling — the marketplaces are optimized for public discovery, not governed enterprise distribution. Private marketplaces are a feature for enterprises buying a single vendor; if you’re multi-vendor (and the model/harness-matters thesis [18] says you should be), a git-native distribution skips the lock-in.

8. The harness still matters more than the model

Michael Tuszynski’s “The model doesn’t matter, the harness does” [18] documents a 9.5-point spread on SWE-bench Pro (45.9% to 55.4%) from running identical Claude Opus 4.5 through different harnesses, and a Sonnet 4.5 with better scaffolding beating Opus 4.5 on Anthropic’s own framework. Model upgrades yield ~1 benchmark point; scaffolding yields 20+. The harness primitives that matter:

Tool orchestration (how agents discover and compose tools)
Context management (compaction + persistent structured state)
Error recovery (distinguishing prevention from recovery)
Deterministic feedback (linters + structural tests per change)
Planning-execution separation (distinct agents for decision vs. implementation)

All five are things you build; none are things a vendor owns. This is the frame for the whole scout: the UX primitives above (agents window, worktrees, markdown subagents, review surface) are the visible layer of the harness. The invisible harness layer — context compaction policy, inner-loop latency budget, feedback-loop composition — remains where most of the leverage lives [18] and aligns with the April 12 harness engineering production-scale scout findings that OpenAI Frontier’s Symphony and Cognition’s Devin both invest primarily in harness scaffolding, not in model selection.

Practical Implications

Build on (stable across 2+ vendors):

Git worktree + branch + PR as the parallel-agent isolation bundle. This is your filesystem primitive.
Markdown + YAML-frontmatter subagent definitions stored in .claude/agents/, .gemini/agents/, or equivalent. Target format compatibility with AGENTS.md.
MCP as the tool surface — write new tools as MCP servers, not vendor-specific tool APIs.
AGENTS.md as the project context file — Linux Foundation AAIF standard, 60,000+ repos, multi-vendor adoption track [16].
Parallel specialized subagents for review — the Claude Code Review pattern (security / quality / performance / coverage / dependency safety) is replicable on any subagent-capable harness.
Per-subagent tool allowlists with wildcard scoping — portable across Claude Code and Gemini CLI, and the right permission primitive regardless.
Review as a separable surface from generation. Stage-style human narrative review and Claude Code Review-style automated finding generation are complements, not substitutes.

Do not bet on (vendor-specific, high lock-in):

Cloud-session resume semantics. Keep every durable state in git. Cloud sandboxes are compute, not homes.
Vendor cost dashboards. Build your own token/budget accounting. Token spend is your business metric, not theirs.
Proprietary plugin marketplaces for internal tooling. Distribute your own skills as git repositories.
Vendor-specific agent-session formats. Any harness state that can’t be reconstructed from git + markdown + MCP config is a migration tax.
Editor-embedded agent UIs as the primary surface. The agent list, not the editor, is the home screen. If your internal platform forces users through an IDE tab for everything, you’ve built against the wrong model.
Cursor’s “Design Mode,” Claude’s private sandbox networking, Gemini’s Google-Cloud-only remote agents — any feature whose deployment model is one vendor’s cloud.

Patterns to implement internally now:

A thin agents/ directory with markdown subagent definitions that work in all three major harnesses, with a small CI check that validates frontmatter against all three schemas.
A worktree-run.sh wrapper that creates a worktree, spawns an agent with a budget cap, records token spend to a log, and writes a PR-ready branch on completion — the primitive that replaces the editor’s “run button.”
A review pipeline that separates the Claude Code Review-style automated findings step from a Stage-style human narrative chapter step, connected by a simple event contract (PR opened → findings produced → chapters generated → human reviews).
An internal AGENTS.md template per repo encoding repo conventions, test commands, and domain context — vendor-neutral and future-proof against the CLAUDE.md/GEMINI.md/AGENTS.md unification.

Open Questions

Session-state portability: will a standard emerge for “paused agent session” (filesystem + chat history + tool-call log) that lets a user move work between Cursor cloud, Claude Code, and Gemini CLI mid-task? Or will vendors deliberately resist standardization here because this is the primary lock-in surface?
Cross-agent context sharing inside a single multi-agent run: the “Agent A doesn’t know what Agent B is doing” problem [14] affects every worktree-based parallelism pattern. Will it be solved by a shared-task-list primitive (Osmani’s proposal [13]), by event-sourced state (LangGraph-style), or remain unsolved?
Cost visibility standards: is there any plausible path to a cross-vendor token-budget protocol, or will every shop end up building its own accounting layer?
Review-surface commoditization: if Claude Code Review’s “parallel specialized subagents with confidence thresholds” pattern becomes table stakes, does Stage’s human-narrative positioning hold, or does it get absorbed into the review surface of the big harnesses?
AGENTS.md vs vendor-specific project files: the Foundation-housed standard exists; Claude Code still reads CLAUDE.md, Gemini CLI still reads GEMINI.md. Will 2026 see full convergence on AGENTS.md, or will the native filenames persist as “brand signals” with AGENTS.md as fallback?
Marketplace governance: will enterprise buyers force vendors to support private marketplaces with audit trails, or will they route around by using git-native distribution? The Cursor 3 private marketplace feature is an early bet on the former; the weight of internal-platform practice seems to be moving toward the latter.