Artificer’s Grimoire — Edition 1 · March 9, 2026

Context engineering has solidified as the defining discipline of production agent work, SDD tooling is fragmenting into three distinct philosophies, and the Agentic AI Foundation is quietly becoming the governance layer for the protocols that matter.

Must Read

Context Engineering for AI Agents: Lessons from Building Manus

Source: Manus AI Blog · Jan 2026 Score: 5 · Tags: context-engineering, production, kv-cache, cost-optimization

The Manus team shares hard-won lessons from four complete rebuilds of their agent framework. The standout insight: KV-cache hit rate is the single most important production metric for agentic workloads. With a 100:1 input-to-output token ratio and a 10x cost difference between cached and uncached tokens on Claude Sonnet, cache optimization isn’t just a performance concern — it’s an economic one.

Why it matters: The 10x cost differential between cached and uncached tokens means your context engineering decisions directly impact per-task economics. For anyone running agents in production, the takeaway is concrete: keep your prompt prefix stable. Even a single-token change at the beginning of your context invalidates the cache for everything that follows. This should be a hard constraint in any agent system design, not an optimization you get to later.

Google ADK: Architecting Context-Aware Multi-Agent Systems for Production

Source: Google Developers Blog · Feb 2026 Score: 5 · Tags: context-engineering, multi-agent, architecture, google-adk

Google’s ADK team codifies three design principles for production context management: separate storage from presentation, use explicit named transformations (not ad-hoc string concatenation), and scope context by default so agents must explicitly reach for more information rather than being flooded.

Why it matters: The “scope by default” principle is the most actionable insight here. Each agent should get the minimum context required and reach for more via tools — the opposite of the “dump everything into the context window” approach that most framework-based systems default to. The ADK’s tiered storage model (Sessions vs working context) is worth studying as a reference for anyone designing agent state management.

Understanding Spec-Driven Development: Kiro, spec-kit, and Tessl (Böckeler)

Source: Martin Fowler’s Blog · Nov 2025 Score: 5 · Tags: sdd, critical-analysis, tools, methodology

Birgitta Böckeler’s hands-on evaluation of three SDD tools is the most honest practitioner analysis in the space. Key findings: agents frequently ignored spec instructions despite having them in context; agents also went overboard eagerly following other instructions; the author found herself frequently confused about when to stay functional vs. add technical details. Her conclusion is telling: “I’d rather review code than all these markdown files.”

Why it matters: This is a reality check for the SDD movement. The failure modes she describes — agents ignoring context, over-following instructions, functional/technical confusion — are the exact problems that governance layers, human review gates, and phased decomposition are designed to catch. If you’re adopting SDD, her critique argues for breaking specs into right-sized, session-scoped chunks rather than verbose up-front designs. The tooling isn’t mature enough to trust with a 50-page spec.

12 Factor Agents: Principles That Actually Ship

Source: HumanLayer + GitHub · Apr 2025 (ongoing) Score: 5 · Tags: production-agents, best-practices, philosophy, architecture

Dex Horthy’s 12 Factor Agents has become the de facto philosophical foundation for production agent engineering. The core insight: most successful AI products aren’t purely agentic loops — they combine deterministic code with strategically placed LLM decision points. The “dumb zone” finding (context window utilization past 40% degrades performance) is backed by analysis of 100K developer sessions.

Why it matters: This is the clearest articulation of what production agent architecture actually looks like. Own your control flow in deterministic code (Step Functions, state machines, workflow engines). Own your context window (CLAUDE.md, structured prompts). Keep agents small and focused. Use tool calls to contact humans at decision boundaries. If your agent infrastructure doesn’t enforce these principles, you’re building a demo, not a product.

Worth Scanning

State of AI Agents 2026 (LangChain) — 1,300+ respondents, 57% have agents in production. Quality is the #1 barrier at 32%, surpassing cost. 89% have observability. Context engineering at scale cited as an ongoing difficulty by large enterprises.
Agentic AI Foundation Formed (InfoQ) — MCP, AGENTS.md, and goose donated to Linux Foundation. AWS, Microsoft, Google, Bloomberg as Platinum members. Simon Willison notes the foundation may be premature for a protocol barely a year old.
Microsoft Agent Framework RC (Various) — Successor to AutoGen hit Release Candidate Feb 2026, merging multi-agent patterns with Semantic Kernel. Supports A2A, MCP, and AG-UI out of the box. GA expected Q1 2026.
arXiv: Context Engineering for AI Agents in OSS — Study of 466 open-source projects using AGENTS.md. Finds no established content structure — huge variation in prescriptive vs descriptive vs prohibitive approaches. Useful baseline for understanding community adoption.
SDD on Wikipedia — SDD now has a Wikipedia page. Traces roots to 2004 synergy of TDD and Design by Contract. Notes the “spec-anchored” variant with governance layers and supervision checkpoints — a pattern gaining traction among teams building agent infrastructure.
OpenAgents: Native MCP + A2A — Only framework with native support for both protocols. Interesting for agent interoperability patterns but early-stage.
AI Coding Landscape 2026 (ToolShelf) — Tracks 204 tools. Claude Code rated highest at 76. 95% open-source. The biggest shift: terminal-native agents composing with existing tools (pipes, scripts, CI) over IDE plugins.

New Tools & Repos

Agent-Skills-for-Context-Engineering — Comprehensive collection of context engineering skills for agent systems. Referenced by Peking University’s Meta Context Engineering research.
GitHub Spec Kit — GitHub’s SDD scaffolding CLI. Downloads templates, sets up spec-driven scaffolding. Experimental (v0.0.30+).

Papers

Context Engineering for AI Agents in Open-Source Software — Mohsenimofidi et al. — First systematic study of AGENTS.md adoption patterns. Analyzes content structure, presentation modes, and evolution over time across 466 projects.

Ecosystem Watch

OpenHands — 68.6K stars, $18.8M Series A. 77.6% on SWE-bench Verified. Migrating from V0 to V1 architecture. Docker-based sandboxing with full browser automation. Single-user/local-first positioning.
GitHub Copilot Coding Agent — Now assigns issues directly to coding agents. Autonomously writes code, creates PRs, responds to feedback. Enterprise-ready with audit logs and governance via Agent HQ.
Cursor Cloud Agents — Launched with dedicated VMs for each agent session. Agents test changes end-to-end before submitting PRs. Key UX insight: “only ask for review once it’s actually ready.”

The Long View

The convergence of SDD, context engineering, and 12 Factor Agents into a coherent discipline is the story of this period. A year ago, these were separate conversations. Now they’re clearly facets of the same problem: how do you make AI agents reliable enough to trust with production work?

The answer emerging from practitioners (not framework vendors) is surprisingly conservative: keep agents small and focused, own your control flow in deterministic code, treat context like a scarce resource, and put humans at the decision boundaries. These principles predate the industry names now being attached to them — and the teams that internalized them early are the ones shipping production agent systems today.

The protocol consolidation under the Agentic AI Foundation is worth watching closely. If MCP, AGENTS.md, and A2A stabilize under neutral governance, any well-architected agent infrastructure becomes a natural integration surface for all three. The teams investing in clean protocol boundaries now will have a significant advantage when these standards mature.

The Artificer’s Grimoire — weekly intelligence on harness engineering and autonomous agents — for practitioners, by Tim Schiller (Artificer Digital).

Artificer's Grimoire — Edition 1 · March 9, 2026