Artificer Digital The Artificer's Grimoire

Scout: Spec-Driven Development at Team Scale — What Works When Individual SDD Doesn't

Summary

A consultancy-side method (ThoughtWorks’ Structured Prompt-Driven Development, or SPDD) and a community-side rallying cry (Specsmaxxing, the YAML-spec post that hit the Hacker News front page) reached the same architectural conclusion in the same week from opposite ends of the social gradient: individual-developer spec-driven habits don’t survive contact with multi-engineer teams running AI coding agents, and the artifact that holds team work together is a versioned, reviewed, branch-mapped specification — not a chat transcript, not an AGENTS.md file, and not a Jira ticket. The mechanics differ. SPDD treats prompts as first-class delivery artifacts under formal review; Specsmaxxing treats feature.yaml files as referenceable acceptance-criteria contracts; GitHub Spec Kit (now at 0.8.4 after three patch releases in eight days) treats markdown specs as branch-bound source-of-truth that 30-plus AI agents consume identically. What converges across all three is the team-scale principle: when reality diverges from the spec, you fix the spec first. What diverges is spec format (Markdown vs YAML vs structured-template), review topology (PR review of prompts vs dashboard-based requirement marking vs constitutional gates), and where the methodology breaks first at scale (review fatigue, spec drift, brownfield onboarding). For practitioner teams scaling past the individual-developer frontier, the choice between approaches is now a real architectural decision with visible tradeoffs — not a vendor-marketing exercise.

Key Findings

1. The team-scale SDD problem is not the spec format — it’s the review topology and the ownership model

Three distinct SDD methodologies surfaced in the same news cycle, and their definitions converge on what a team-scale spec is:

  • ThoughtWorks SPDD defines the artifact as a versioned prompt: per the Martin Fowler / ThoughtWorks article, SPDD is “an engineering method that treats prompts as first-class delivery artifacts” that go “through a defined workflow and stay the record of what was intended.” The seven-part REASONS Canvas (Requirements, Entities, Approach, Structure, Operations, Norms, Safeguards) is the structured template; the openspdd CLI ships commands for /spdd-analysis, /spdd-reasons-canvas, /spdd-generate, and — critically for the team-scale story — /spdd-sync for code-back-to-spec reconciliation.
  • Specsmaxxing (acai.sh) defines the artifact as a YAML file with referenceable IDs. The example schema groups requirements: under named components (AUTH, ENG) with numbered entries (1, 1-1, 2) so a requirement can be addressed unambiguously as my-feature.ENG.2. The author’s framing on why this matters for teams is that “the context window is the limit” — when a session ends, a developer hands off, or context gets compacted, only the spec survives.
  • GitHub Spec Kit (spec-driven.md in the repo) defines the artifact as a markdown directory tree under specs/[branch-name]/ containing spec.md, plan.md, data-model.md, contracts/, tasks.md, and a quickstart.md. Spec Kit’s framing inverts the traditional hierarchy: “Specifications don’t serve code—code serves specifications.” Microsoft’s developer-blog walkthrough describes the team-level coordination mechanism as a project “constitution” — non-negotiable principles that align decisions across distributed implementations.

The format difference is real but secondary. The harder question — and the one all three methodologies have started answering this week — is who reviews the spec, where the review happens, and what triggers it. SPDD’s answer is the same code-review pipeline, with the prompt artifact reviewed before the generated code. Spec Kit’s answer is constitutional gates and the /speckit.analyze cross-artifact consistency check that runs before /speckit.implement. Specsmaxxing’s answer is a dashboard where humans mark requirements as Completed, Accepted, or Rejected, with notes attached. None of them have solved review fatigue at 50-plus-engineer scale; all of them have stopped pretending the problem doesn’t exist.

2. The “fix the spec first” rule is the actual coordination primitive

The single most operationally consequential pattern shared across the methodologies is a rule about which artifact gets edited when reality diverges from intent. SPDD’s version: “When reality diverges, fix the prompt first — then update the code” (ThoughtWorks article). Spec Kit’s version is the bidirectional sync between specification and implementation: when requirements change, the implementation plan flags affected technical decisions; when the implementation discovers something the spec missed, the spec gets updated and re-reviewed.

This rule looks trivial when phrased; it is not trivial in practice. The default behaviour of every coding agent that exists in 2026 is to edit the code and leave the spec untouched. Without a team-enforced “spec-first” reflex — backed by tooling that surfaces spec/code drift — the spec degrades into stale documentation within weeks. The InfoQ enterprise-SDD piece (Spec-Driven Development – Adoption at Enterprise Scale) names this failure mode “SpecFall” — the markdown equivalent of Scrumerfall — where specs become “outdated documentation generated on arrival rather than living collaboration surfaces.”

ThoughtWorks’s SPDD operationalises the rule with two separate response patterns to code-review findings: logic corrections (behaviour-changing) update the structured prompt first, then regenerate code; refactorings (non-behavioural) refactor code first, then sync changes back to prompt via /spdd-sync. That distinction is what stops spec-first discipline from degrading into “every typo fix triggers a spec rewrite,” which is the version of this rule that gets abandoned within a sprint.

3. The community side reaches the consultancy side from the failure-mode end

The Specsmaxxing post — 264 points / 276 comments on Hacker News — is the practitioner-side mirror of the SPDD argument and gets there from a different direction. Where ThoughtWorks frames the problem as scaling individual-developer practice to team-level workflow, the Specsmaxxing author frames it as surviving long-running agent drift. The author’s working term for the failure is “AI psychosis” — the gradual decoherence of agent output as context fills with accumulated noise from failed attempts, debug output, and tangents. The fix the author proposes is the same one ThoughtWorks proposes for a different reason: a versioned spec that the agent must reference and that survives session boundaries.

The convergence is the signal. When a large consultancy publishing on Martin Fowler’s site and an independent practitioner trending on the Hacker News front page reach the same architectural conclusion in the same week, the pattern has crossed a credibility threshold. The two communities are talking past each other in vocabulary — “prompt as delivery artifact” versus “feature.yaml as acceptance-criteria contract” — but converging on artifact discipline as the load-bearing variable.

The Hacker News thread itself is more useful as a constraint check on the convergence than as a celebration of it. The most-engaged dissenting voice (lelanthran) challenges whether reviewing AI-generated code produces the same architectural insights as writing it, raising the question of what spec-driven team workflow does to senior-engineer judgment over time. A separate strand (bdangubic, drawing on government-contracting experience under DoD-STD-2167) flags that every spec format the author has worked with became “stale/inaccurate/obsolete within 6-12 months” because “there is never time allocated to keep them up-to-date.” A third strand (mike_hearn) advocates for Gherkin/Cucumber as the executable-spec alternative, which is the methodology the SPDD and Spec Kit families largely don’t address. None of these critiques invalidate the convergence; they sharpen the question of what discipline the team must hold for the methodology to survive past quarter two.

4. Where the methodologies diverge: spec format, brownfield strategy, and tooling assumptions

Format. Spec Kit standardises on Markdown across spec.md, plan.md, tasks.md, and a contracts/ directory. SPDD uses structured-template documents (the REASONS Canvas) that are still markdown-based but follow a strict seven-section schema. Specsmaxxing uses YAML for the requirement-IDs primitive specifically because it makes individual requirements unambiguously referenceable. None of these is “the right” choice — Markdown is human-readable but loose, YAML is structured but verbose, the REASONS Canvas is opinionated but high-cognitive-load. The Augment Code review of contemporary SDD tooling notes the practitioner-side complaint: a markdown-heavy spec workflow produces “a LOT of markdown files” that are “repetitive…verbose and tedious to review” (Martin Fowler’s site comparison piece).

Brownfield strategy. The InfoQ enterprise piece flags this as a real gap: current SDD tooling assumes greenfield projects where the spec exists before the code. For brownfield codebases — most production code in most companies — there is no clear path that doesn’t require retroactively speccing entire systems. SPDD’s /spdd-sync partially addresses this by allowing code-to-spec reconciliation, but the underlying assumption is still that a spec exists somewhere to reconcile against. Spec Kit’s response is implicit: branch-bound specs make it possible to start spec-driving one feature at a time without re-speccing the codebase wholesale, but the boundary between spec-covered and unspec-covered code becomes a new coordination problem.

Tooling assumptions. SPDD’s openspdd CLI supports Cursor, Claude Code, GitHub Copilot, and Antigravity, with a stated goal of running the same REASONS Canvas across different AI environments. Independent practitioner write-ups on the SPDD adoption signal — for example mgks.dev’s walkthrough and osmanperviz.com’s “Prompts as architecture artifact” piece — confirm the cross-tool framing as the part of the methodology that lands fastest with practitioners. Spec Kit explicitly supports 30-plus AI coding agents per its repository documentation. Specsmaxxing is tool-agnostic by virtue of being a format convention rather than a tool. The cross-tool support matters at team scale because individual developers on the same team will, in 2026, be using different AI coding agents and the spec is the only artifact the team can guarantee everyone reads identically.

5. The enterprise SDD gaps the consultancy literature now names explicitly

The InfoQ piece (Spec-Driven Development – Adoption at Enterprise Scale) is the clearest contemporary catalogue of what current SDD tooling does not yet do. The named gaps:

  • Developer-centric tooling: SDD tools live in Git repositories, code editors, and CLIs, which makes meaningful product-manager participation in spec creation difficult.
  • Mono-repo focus: Specs co-located with code don’t survive microservices architectures spanning multiple repositories.
  • Backlog disintegration: Most SDD tools have no integration path with Jira or Azure DevOps where qualified backlogs already exist.
  • Undefined collaboration patterns: “Not all stakeholders participate in all phases” yet “tooling doesn’t map these distinct involvement patterns.”
  • Specification-to-implementation alignment: Validating actual implementation against specs remains “the elephant in the room.”
  • Brownfield onboarding: No clear path for existing code without retroactively speccing entire systems.

ThoughtWorks’s own broader SDD piece (Spec-driven development — unpacking one of 2025’s key new AI-assisted engineering practices) carries explicit antipattern warnings: “Experienced programmers may find that over-formalized specs can cause unnecessary trouble, and slow down change and feedback cycles,” and “Spec drift and hallucination are inherently difficult to avoid, so we still need highly deterministic CI/CD practices.” The SPDD article itself flags an adjacent tension that the Hacker News critique surfaces from a different angle: that SPDD “can look like a method reserved for senior architects because it places a high bar on abstraction and modelling,” which makes the methodology adoption itself a senior-engineer-supply problem.

These are the limitations the practitioner literature is now willing to name out loud. A team adopting SDD this quarter is choosing between methodologies that have started disclosing what they cannot yet do.

Practical Implications

For practitioner teams scaling SDD past the individual-developer frontier in 2026-Q2:

  • Pick a spec referent before a tool. The format choice (Markdown / YAML / REASONS Canvas) is downstream of the question of what makes a spec referenceable in code reviews. If your team is dropping links to “AUTH.1-1” in pull-request comments, YAML’s referenceable IDs (Specsmaxxing pattern) start paying off. If your team is reviewing whole-feature specs as documents, markdown is fine. The team-scale failure isn’t in the format; it’s in not having a referent at all.
  • Operationalise the “fix-the-spec-first” rule with separate logic-vs-refactor handling. SPDD’s distinction between logic corrections (spec → code regeneration) and refactorings (code → spec sync via /spdd-sync) is the version of this rule that survives review fatigue. A blanket “always edit the spec first” rule degrades within a sprint; the bifurcated rule has a reasonable shot at lasting.
  • Map specs to branches, not to tickets. Spec Kit’s specs/[branch-name]/ convention bakes this in; Spec Kit commands “automatically detect the active feature based on your current Git branch.” For teams not using Spec Kit, the equivalent discipline is: when a feature branch closes, the spec for that feature is either merged into the main spec corpus or deleted with the branch. Specs that float free of git topology become SpecFall artifacts within weeks.
  • Resist single-agent spec workflows on multi-agent teams. A team where engineer A uses Cursor and engineer B uses Claude Code and engineer C uses GitHub Copilot needs the spec to be readable identically by all three. SPDD’s openspdd CLI and Spec Kit’s 30-plus-agent support both target this; tool-coupled spec formats do not.
  • Budget review time explicitly for the specs themselves. The Augment Code commentary on Spec Kit’s verbosity is the warning: “I’d rather review code than all these markdown files” is the failure mode where the team has not budgeted spec-review time as separate from code-review time. If the spec is a delivery artifact, it deserves PR-style review on its own; if it doesn’t get that review, it isn’t actually the source of truth and the team should drop the pretence.
  • Treat the brownfield-onboarding question as unsolved. None of the three methodologies surveyed have a satisfying answer for retrofitting SDD onto existing codebases without partial spec coverage. Teams adopting SDD on a non-greenfield codebase should plan for a hybrid period where some features are spec-driven and others are not, and the boundary itself is a coordination cost.

Open Questions

  • Spec ownership models at 20-plus-engineer scale. Who arbitrates when a product manager edits a spec the way they understand the requirement and an architect’s downstream technical decision now contradicts it? None of the surveyed methodologies have a satisfying answer beyond constitutional gates and review processes that scale linearly with team size.
  • Does the senior-engineer-cognitive-bandwidth bet hold? Both SPDD and the InfoQ enterprise piece converge on the framing that AI-assisted software development is now a contest of human cognitive bandwidth, not model IQ. That has uncomfortable implications for team composition (the methodology may not work without a strong senior-engineer bench) and for team-development pipelines (the methodology produces fewer of the writing-code learning experiences that grow junior engineers into senior ones).
  • What’s the actual half-life of a spec in production? The Hacker News critique flagging that DoD-STD-2167-style specs degrade within 6–12 months is not contradicted by any contemporary SDD success story. The Specsmaxxing dashboard pattern (Completed / Accepted / Rejected with notes) is one bet on what keeps specs alive; SPDD’s bidirectional sync is another. Public reporting on which approach survives a year of active development remains thin.
  • Whether Gherkin/BDD is genuinely an alternative or a complement. The strongest dissenting voice on the Hacker News thread argued for executable specifications (Gherkin/Cucumber) as the format the AI-spec discussion is rediscovering badly. SPDD and Spec Kit largely route around this; whether the next iteration of either methodology absorbs Gherkin-style executable specs or stays in the structured-prose lane is an open architectural question.
  • The PM-in-the-spec problem. The InfoQ piece names “developer-centric tooling” as a primary gap. None of the three methodologies surveyed have shipped a credible answer for getting product managers, designers, and other non-developer stakeholders into the spec authoring loop without forcing them into Git. This is the gap the next generation of SDD tooling is most likely to compete on.

Sources

  1. Structured-Prompt-Driven Development (SPDD) — Martin Fowler / ThoughtWorks, 2026-04-28. The consultancy-side methodology piece. REASONS Canvas, openspdd CLI, the seven-part structured-prompt template.
  2. Specsmaxxing — On overcoming AI psychosis, and why I write specs in YAML — acai.sh, 2026-05-03. The community-side practitioner post that hit Hacker News. feature.yaml format and the referenceable-requirement-IDs pattern.
  3. Specsmaxxing discussion thread — Hacker News, 2026-05-03. 264 points / 276 comments. Source for the practitioner-side critiques (lelanthran on review-vs-write, bdangubic on DoD-STD-2167 spec rot, mike_hearn on Gherkin alternatives).
  4. GitHub Spec Kit 0.8.4 release — GitHub, 2026-05-01. Three-patch-release week; Spec2Cloud Azure extension, Squad Bridge community catalog addition, governance catalog additions.
  5. Spec Kit repository — GitHub. Source for the 30-plus-agent integration claim and the formal command list (/speckit.specify, /speckit.plan, /speckit.tasks, /speckit.analyze, /speckit.implement).
  6. Spec Kit — spec-driven.md — GitHub. Source for the formal definition (“Specifications don’t serve code—code serves specifications”), the specs/[branch-name]/ directory convention, and the constitutional-gates pattern.
  7. openspdd CLI repository — GitHub. Source for the openspdd command list, supported AI environments (Cursor, Claude Code, GitHub Copilot, Antigravity), and the per-tool config-directory layout.
  8. Diving Into Spec-Driven Development With GitHub Spec Kit — Microsoft for Developers. Source for the three-phase command structure (/specify, /plan, /tasks) and the constitution-as-team-coordination-mechanism framing.
  9. Spec-Driven Development – Adoption at Enterprise Scale — InfoQ. Source for the named enterprise-SDD gaps (developer-centric tooling, mono-repo focus, backlog disintegration, undefined collaboration patterns, brownfield onboarding) and for the “SpecFall” antipattern naming.
  10. Spec-driven development — unpacking one of 2025’s key new AI-assisted engineering practices — ThoughtWorks Insights. Source for ThoughtWorks’s general SDD framing distinct from SPDD, the spec-drift-and-hallucination caveat, and the “over-formalized specs can cause unnecessary trouble” antipattern warning.
  11. Understanding Spec-Driven Development — Kiro, spec-kit, and Tessl — Martin Fowler’s site. Tool-comparison piece. Source for the practitioner critique that Spec Kit produces “a LOT of markdown files” that are “repetitive…verbose and tedious to review.”
  12. Treating AI Prompts Like Code — What I Learned From Thoughtworks’ SPDD Method — mgks.dev, 2026-04-29. Independent practitioner write-up on SPDD adoption signal.
  13. Prompts as architecture artifact — osmanperviz.com. Adjacent practitioner framing piece on the prompt-as-versioned-artifact thesis SPDD formalises.