Scout: MCP Goes Stateless — A Migration Map for the 2026-07-28 Revision

Summary

The Model Context Protocol’s next revision — a release candidate dated 2026-07-28 — is the biggest structural change to the protocol since it launched. It deletes protocol-level sessions and the Mcp-Session-Id header, removes the initialize/notifications/initialized handshake, and makes MCP stateless: every request now carries its protocol version, client identity, and capabilities in per-request _meta instead of negotiating them once at connection time (changelog). A mandatory server/discover RPC replaces the handshake for up-front capability negotiation; a single subscriptions/listen stream replaces both the HTTP GET endpoint and resources/subscribe; and a new Multi Round-Trip Requests (MRTR) pattern replaces server-initiated requests like sampling/createMessage and elicitation/create. Three first-class primitives — Roots, Sampling, and Logging — are deprecated under a new feature-lifecycle policy that guarantees at least twelve months before anything is removed. The official framing is operational: a remote server that “previously needed sticky sessions, a shared session store, and deep packet inspection at the gateway can now run behind a plain round-robin load balancer” (MCP blog). For teams operating servers and gateways, that one sentence is both the prize and the migration project. Note the disambiguation up front: throughout this briefing, MCP is the Model Context Protocol, the Anthropic-originated agent-tooling protocol now governed under the Linux Foundation’s Agentic AI Foundation — not any of the other things that acronym stands for.

Key Findings

1. The handshake is gone, and that is the load-bearing change

The pre-2026-07 lifecycle began every connection with initialize from client to server, an initialized notification back, and an Mcp-Session-Id header that all subsequent requests echoed, with the server holding that session in memory. The new revision removes the entire exchange. Identity and capabilities move into three _meta keys carried on every request — io.modelcontextprotocol/protocolVersion, io.modelcontextprotocol/clientInfo, and io.modelcontextprotocol/clientCapabilities — and a version mismatch now returns UnsupportedProtocolVersionError rather than failing at handshake time (changelog, SEP-2575). Because the negotiation no longer happens once and stick, a new mandatory server/discover RPC carries it: every server MUST implement server/discover to advertise its supported versions, capabilities, and identity, and clients MAY call it up front for version selection or use it as a backward-compatibility probe on STDIO (changelog, SEP-2575).

The reason this is the load-bearing change, rather than just a cleaner wire format, is what it does to deployment. A stateless request can land on any server replica, so the operational scaffolding that existed only to keep a conversation pinned to one instance — sticky sessions at the load balancer, a Redis-backed session store, gateway body inspection to extract the session id — stops being necessary. The official post is blunt about the payoff: the server “can now run behind a plain round-robin load balancer, route traffic on an Mcp-Method header, and let clients cache tools/list responses for as long as the server’s ttlMs permits” (MCP blog). Header-based routing on the newly required Mcp-Method and Mcp-Name headers replaces deep-packet inspection of the JSON-RPC body (changelog, SEP-2243). For anyone who fought the session-affinity-versus-horizontal-scaling problem that a meaningful slice of last quarter’s gateway products was built to solve, the protocol just removed that particular problem at the source.

2. Sessions are replaced by server-minted handles — state moves into tool arguments

Statelessness doesn’t mean stateful workflows are impossible; it means state stops being the protocol’s job and becomes the application’s. The replacement pattern is the one HTTP APIs have used for decades: a tool mints an opaque identifier and returns it, and the model passes it back as an ordinary argument on later calls. The official framing — “mint an explicit handle (a basket_id, a browser_id) from a tool and have the model pass it back as an ordinary argument on later calls” (MCP blog) — captures the whole mechanic. The handle is server-minted and opaque; the model threads it through the conversation without the protocol needing a session at all.

The SEP that drove this (SEP-2567, “Sessionless MCP via Explicit State Handles”) is explicit that handles are not merely a workaround but strictly more expressive than sessions. Sessions force a cardinality of exactly one per connection, which “prevents mixed shared/isolated state patterns,” whereas explicit handles “are strictly more expressive and make list endpoints cacheable at (deployment, auth) granularity” (SEP-2567). The same proposal names the performance motivation directly: session-scoped list endpoints forced clients to re-fetch tools, resources, and prompts at scale — “O(subagents × servers) re-fetches on everyone, even when ~zero servers actually use it” (SEP-2567). The practitioner caveat the same discussion surfaces is worth internalizing: handles must be designed to be portable across replicas. If a basket_id is only meaningful to the instance that minted it, parallel initial calls can route to different backends and each mint separate state — “split-brain state across instances” (SEP-2567). The handle should encode or reference shared state (a database row, a signed token), not a pointer into one process’s memory.

3. MRTR rewrites how servers ask the client anything — and shrinks what they can ask

Server-initiated requests — the mechanism behind roots/list, sampling/createMessage, and elicitation/create — assumed a persistent bidirectional connection: the server could, mid-request, turn around and call the client. Statelessness breaks that assumption. The replacement is the Multi Round-Trip Requests (MRTR) pattern: instead of initiating a request back to the client, a server returns a new result type carrying inputRequests — “the additional information needed to process the request” — and the client supplies inputResponses on its next request (changelog, SEP-2322). The state needed to resume travels in the payload, so any server instance can pick up the follow-up — the official description has the server return an InputRequiredResult and the client re-issue the original call with the gathered answers (MCP blog). The conceptual shift is from a bidirectional callback to a self-contained round-trip: every message carries the context needed to resume, so nothing depends on a connection staying pinned to one instance.

Two consequences deserve attention. First, this is a breaking change for any server that relied on server-initiated requests, which is most servers doing elicitation. Second — and easier to miss — MRTR narrows when a server may ask the client for anything: server-to-client requests can only be issued while the server is actively processing a client request, a reduced scope relative to the prior spec (SEP-2322). The free-floating, any-time server callback is gone. If your server’s design depended on reaching the client outside the scope of an in-flight request, that pattern no longer has a protocol-level home.

4. The Sampling deprecation is the one that should worry gateway and platform builders

Roots, Sampling, and Logging are deprecated together under SEP-2577 (changelog). Logging and Roots have clean migrations — log to stderr or OpenTelemetry; pass directories and files as tool parameters, resource URIs, or server configuration. Sampling is the consequential one. Sampling let a server borrow the client’s model to do inference — the server asks the client to run a completion, and the client (which holds the model, the key, and the budget) runs it and returns the result. The migration the spec suggests is “integrate directly with LLM provider APIs instead of Sampling” (changelog), which sounds simple and quietly relocates a whole capability.

The architectural pattern most exposed by Sampling’s deprecation is the agentic server — a server that hosts a model loop of its own rather than acting as a passive tool. Practitioner commentary frames this as a missed opportunity: Sampling let a tool “fan out across a dozen searches, read and summarize the results, follow the promising threads, and synthesize a final report,” with the server effectively becoming an agent orchestrator — “instead of the client orchestrating everything, the tool itself becomes an agent orchestrator,” and crucially it could do all of it “borrowing the client’s LLM, and on the client’s dime” (Nullpointer blog). Remove Sampling and that intelligence has to come from somewhere: the server now needs its own provider integration, its own API keys, and its own inference budget. For enterprise gateway designs, this is a real shift — a gateway that brokered Sampling calls as a way to centralize model access for downstream servers loses that lever, and every server that wanted model access becomes a direct LLM-API consumer with its own credential and cost surface to govern. The deprecation rationale on the spec side is low adoption — Sampling “adds a lot of complexity for a feature that has almost no adoption” (SEP-2577) — which is a defensible call. But teams that did build on Sampling, especially those using it to keep model credentials out of their server fleet, have twelve months to re-architect, not a free migration.

5. The deprecation clock is real, and the lifecycle policy is the bigger governance story

The single most important number for planning is the deprecation window. The new feature-lifecycle policy (SEP-2596) defines three feature states — Active, Deprecated, Removed — and a “minimum deprecation window: the number of months, at least twelve, that the feature must remain Deprecated before it is eligible for removal,” measured from the release of the revision that first marks it Deprecated (feature-lifecycle policy). The window can be shortened only for an active security risk with a published advisory, and even then must leave at least ninety days (feature-lifecycle policy). Practically: Sampling, Roots, and Logging keep working in this revision and for at least a year after, and removing them requires a separate SEP. New implementations should not adopt them; existing ones have a real, bounded runway.

This matters beyond the specific deprecations. Until now, MCP’s deprecation discipline was ad hoc, and the gateway-vendor ecosystem that grew up around the protocol was partly insurance against unpredictable change. A formal lifecycle with a twelve-month floor, a canonical deprecated-features registry, and Tier 1 SDK obligations to surface deprecation warnings (feature-lifecycle policy) turns the protocol into something an enterprise can plan a multi-quarter roadmap against. The Agentic AI Foundation’s framing presents the narrowing of MCP’s responsibilities as a sign of maturity rather than retreat (AAIF). That is the right read: a protocol that can deprecate its own primitives on a published clock is a more credible foundation than one that can’t.

6. CacheableResult plus deterministic ordering is a quiet win for prompt-cache economics

Two smaller changes combine into a genuine cost lever. First, a new CacheableResult interface (SEP-2549) requires ttlMs and cacheScope fields on results from tools/list, prompts/list, resources/list, resources/read, and resources/templates/list. ttlMs is a freshness hint in milliseconds telling clients how long a response may be cached; cacheScope is "public" or "private", controlling whether shared intermediaries may cache the response across users (changelog). Modeled on HTTP Cache-Control, this replaces the previous situation where each client invented its own TTL and a long-lived SSE stream was the only way to learn that a list had changed. Second, servers SHOULD now return tools/list in a deterministic order specifically “to enable client-side caching and improve LLM prompt cache hit rates” (changelog).

The economics behind the second point are why it’s worth the words. Tool definitions are a fixed prefix on every model call, and that prefix is large: one practitioner guide estimates a typical five-server, fifty-eight-tool MCP deployment at “over 55,000 tokens before the first user message,” and a few more integrations — “Jira alone uses ~17,000 tokens” — pushes it past 100,000 tokens of context spent on tool definitions alone (ChatForest). Prompt caching is what makes that affordable; the same guide estimates that at Claude Opus 4.6 pricing, “an uncached 100K-token prompt costs $0.50 per request; with prompt caching, that drops to $0.05” (ChatForest). Prompt caches are prefix-sensitive: reorder the tools and the cached prefix is invalidated. A server that emits tools in nondeterministic order silently breaks the client’s cache on every reconnect, turning a $0.05 request back into a $0.50 one. Making ordering deterministic and giving clients an explicit TTL is the protocol finally treating the tool-definition prefix as the expensive, cacheable asset it has always been. The token counts and dollar figures above are one practitioner estimate, not measured benchmarks, but the shape — roughly an order-of-magnitude reduction on the cached prefix — is the durable point.

Practical Implications

What breaks on SDK upgrade

Treat the upgrade as a transport-layer migration, not a version bump. The concrete breakages, each traceable to the changelog:

The handshake is gone. Any code that calls initialize / waits for initialized, or that reads or sets Mcp-Session-Id, must be rewritten to carry version, identity, and capabilities in _meta and to implement (server side) or optionally call (client side) server/discover.
Server-initiated requests are restructured. Servers using sampling/createMessage, elicitation/create, or roots/list callbacks must move to MRTR’s inputRequests/inputResponses round-trip — and accept that they can now only ask the client for input while actively processing a request.
New required headers. Mcp-Method and Mcp-Name are required on Streamable HTTP POSTs (SEP-2243). Gateways routing on body content should switch to header routing.
An error code changed. Resource-not-found moved from -32002 to -32602 (Invalid Params). Client code pattern-matching on -32002 will silently stop catching it.
subscriptions/listen replaces the GET endpoint and resources/subscribe. Clients that relied on the HTTP GET stream or explicit resource subscriptions opt in to typed notifications on the new single stream instead. Request-scoped notifications like notifications/progress and notifications/message still ride the response stream of the request they belong to, not subscriptions/listen.
The experimental Tasks API moved. Anyone who shipped against the 2025-11-25 experimental Tasks primitive migrates to the new io.modelcontextprotocol/tasks extension (SEP-2663), which swaps blocking tasks/result for tasks/get polling.

For teams running a gateway

The stateless rewrite is mostly good news for you, with one design pivot. You can decommission session-affinity machinery: sticky-session config, the session store if MCP was its only tenant, and body inspection for session ids. Move routing to the Mcp-Method / Mcp-Name headers — a practitioner migration walkthrough describes the same move from stateful sets with affinity to plain deployments behind a round-robin balancer, plus the -32002-to--32602 error-code change client code is most likely to silently miss (DEV Community). The pivot is that your value proposition shifts. A meaningful slice of last quarter’s gateway products existed to solve session-affine routing and stateful-set scaling; that slot just shrank. The durable gateway concerns — auth termination, policy and tool allow-listing, observability, egress control, tool-poisoning defense — are untouched and arguably more important now that more servers run as plain stateless deployments. If your gateway brokered Sampling to centralize model access, plan its replacement now; that pattern is on the twelve-month clock.

For teams running MCP servers

Three things to do before the 2026-07-28 cutover. First, audit for hidden session dependencies — anywhere your server assumes in-memory state survives across calls is a latent bug under round-robin routing. Design server-minted handles that reference shared, replica-portable state rather than process memory. Second, make tools/list deterministic and set ttlMs/cacheScope honestly — this is a near-free cost reduction for every client that caches your tool prefix, and nondeterministic ordering actively costs your users money. Third, inventory Sampling, Roots, and Logging usage and start the migration: provider-direct inference for Sampling (with the credential and budget implications that brings), tool-parameter or resource-URI file scoping for Roots, and stderr/OpenTelemetry for Logging.

Timeline posture

The release candidate is locked, with final publication dated 2026-07-28, and the official framing describes the intervening window as time for SDK maintainers and client implementers to validate against real workloads (MCP blog). Spec text can still shift if blocking issues surface during the window, so pin to the RC for prototyping and gate production cutover on the final publication — but start the transport-layer work now, because none of the breaking changes above get smaller by waiting.

Open Questions

Do gateway vendors reposition fast enough? A chunk of the 2026 gateway category was session-affinity and stateful-scaling machinery. Stateless MCP erodes that slot. Whether the incumbents pivot cleanly to the durable concerns (auth, policy, observability, egress) or scramble is the ecosystem story to watch over the next two quarters.
Where does server-side agentic intelligence go now that Sampling is deprecated? The agentic-server pattern doesn’t disappear — it relocates to direct provider integration inside each server. Whether a successor pattern emerges that re-centralizes model brokering (a gateway feature, a new SEP) or whether every server simply becomes its own LLM-API consumer is unresolved, and it has real governance and cost consequences either way.
Will server-minted handles converge on a portable convention? The spec mandates handles but not their shape. Without a shared convention for replica-portable, optionally-signed handles, every server reinvents it and the “split-brain state” failure mode shows up in production deployments that didn’t think carefully about it.
How much does the RC actually move before 2026-07-28? A locked RC with a fixed publication date still allows changes for blocking issues. Teams that start migrating against the RC are betting the transport-layer shape holds. It very likely does — the structural decisions are made — but the long tail of field names and error semantics could still shift.
Does the twelve-month lifecycle floor hold under pressure? The policy is new. The first time a widely-used feature hits its removal-eligible date, the ecosystem learns whether “at least twelve months” is a real contract or a default that gets extended indefinitely. Either outcome is informative; neither is established yet.

Sources

Key Changes — 2026-07-28 draft changelog — Model Context Protocol (authoritative list of every change, with SEP numbers, field names, and deprecations)
The 2026-07-28 MCP Specification Release Candidate — Model Context Protocol blog (official framing, server-minted handles, load-balancer story, RC lock and publication dates)
Feature Lifecycle and Deprecation Policy — Model Context Protocol (Active/Deprecated/Removed states, twelve-month minimum window, expedited-removal floor, SDK obligations)
SEP-2567: Sessionless MCP via Explicit State Handles — session removal rationale, explicit-handle expressiveness, O(subagents × servers) re-fetch motivation, split-brain warning
SEP-2575 (PR) — statelessness, handshake removal, server/discover, subscriptions/listen, logging-via-_meta
SEP-2322: Multi Round-Trip Requests (PR) — MRTR inputRequests/inputResponses, reduced server-initiated-request scope
SEP-2549 (PR): TTL/cache fields for list results — CacheableResult, ttlMs, cacheScope public/private, affected methods
SEP-2577 (PR): Deprecate Roots, Sampling, and Logging — deprecation rationale (low adoption, complexity)
The new MCP spec and the unfortunate deprecation of MCP Sampling — Nullpointer blog (practitioner argument on the agentic-server pattern lost with Sampling)
MCP Is Growing Up — Agentic AI Foundation (governance-maturity framing of statelessness and the lifecycle policy)
MCP Caching Strategies — ChatForest (tool-definition token overhead figures, Opus 4.6 cached-vs-uncached cost estimate)
MCP Spec Ships July 28 — Every Breaking Change and How to Migrate — DEV Community (practitioner migration walkthrough: load-balancer simplification, header routing, error-code change)
The 2026-07-28 MCP Release Candidate (release tag) — Model Context Protocol (RC release entry pointing to the draft spec and changelog)
MCP Enterprise Gateway Patterns — The Middle Tier Emerges — prior Grimoire scout (continuity; the slot-based gateway landscape that the stateless rewrite reshapes)