Scout: CLAUDE.md / AGENTS.md Under the Microscope — Guardrails Beat Guidance

Summary

Two arXiv papers dropped within four days of each other in April 2026 and, between them, they move the conversation about CLAUDE.md / AGENTS.md / .cursorrules from folklore to evidence. Zhang et al.’s “Do Agent Rules Shape or Distort?” scraped 679 real-world rule files (25,532 rules) and ran over 5,000 agent trajectories on SWE-bench Verified. Their headline: rules improve agent performance by 7–14 percentage points, but random rules help as much as expert-curated ones, and positive directives (“follow code style”) actively hurt performance — only negative constraints (“do not refactor unrelated code”) are individually beneficial. Ma et al.’s ZORO paper approaches the same problem from the HCI side: make rules active rather than passive, require agents to prove each rule was satisfied, and let users evolve rules in-situ. With ZORO’s full enrichment+enforcement pipeline, rule-following jumps from a 51% baseline to 80%, with process rules improving from 35% to 87%. Together the two findings cut against the prevailing “write more and better rules” instinct and suggest teams should prune their CLAUDE.md down to guardrails and invest tooling in verification, not prose.

Key Findings

The Zhang et al. empirical result: rules work through priming, not instruction

The authors scraped 679 rule files (CLAUDE.md, .cursorrules, AGENTS.md, etc.) from public GitHub repos, extracted 25,532 individual rules, and ran >5,000 agent trajectories against SWE-bench Verified. Four findings matter for practitioners:

Rules do help, but in aggregate. Attaching a rule file — any rule file — lifts pass rates by 7–14 percentage points versus no rule file at all.
Random rules work as well as curated ones. Replacing the real rules with randomly sampled rules from unrelated repos produced statistically indistinguishable gains. This is the paper’s most provocative finding: the benefit comes from context priming (putting the agent in “careful engineer” mode), not from the specific propositional content.
Positive directives hurt when isolated; negative constraints help. When the authors ablated individual rules, positive directives (“follow code style,” “write clean, idiomatic code,” “prefer functional patterns”) consistently degraded agent performance relative to no rule at all. Negative constraints (“do not modify unrelated files,” “do not add dependencies without approval”) were the only rule type that helped individually. The authors frame this through potential-based reward shaping (PBRS): negative constraints define a bounded feasible region, while positive directives introduce a biased gradient that pulls the agent toward work it wasn’t asked to do.
No saturation up to 50 rules. Stacking more rules didn’t degrade performance through the tested range — but individual rules that helped in combination were mostly harmful when tested alone. Collective effect ≠ individual effect.

The headline for maintainers: most of the rules people carefully hand-write are either inert or actively harmful in isolation. They work in aggregate largely because they prime the model to think it’s in a high-effort engineering context.

The ZORO result: enforcement beats exhortation

Ma, Wang, Kung, and Chilton (Columbia DAPLab) built ZORO around a different intuition: rules files are passive text, and passive text is easy for the agent to ignore. ZORO anchors rules to every step of the coding plan, requires the agent to produce evidence of compliance before advancing, and lets users edit rules in-situ based on that evidence.

Their technical evaluation ran 36 vibe-coding sessions across four conditions:

Condition	Rules followed
No ZORO (baseline)	51% (±0.08)
Basic ZORO	55% (±0.11)
Enrich only	68% (±0.07)
Enrich + Enforce (full)	80% (±0.08)

The enforcement step alone moves process rules (commit conventions, test-first discipline, etc.) from 35% compliance to 87%. Coding rules moved more modestly (59% → 75%) — consistent with Zhang et al.’s finding that positive directives about code style are harder to operationalize than procedural constraints.

The 12-person user study (7F / 5M, 5+ years experience, 10 daily rule-file users) produced a behavioral finding worth quoting: 9 of 12 participants shifted from “prompt engineering” to “rule engineering” once rules became visible and enforceable. Rule compliance improved from 77.9% on Task 1 to 94.4% on Task 2 after users refined their rules based on enforcement evidence. All 12 said they would adopt ZORO in production.

Where the papers agree with community folklore

Shorter beats longer. Anthropic’s own Claude Code guidance, HumanLayer’s widely-shared essay, and the Dometrain/Builder.io guides all push for <300 lines (HumanLayer runs at <60). Zhang et al.’s no-saturation-up-to-50 finding is consistent: you don’t need 300 rules. The prevailing practitioner wisdom that “Claude starts ignoring parts of CLAUDE.md past ~80 lines” is plausible given the priming-not-instruction mechanism.
Don’t automate a linter’s job. Every practitioner guide (“Never send an LLM to do a linter’s job”) maps directly to Zhang et al.’s finding that positive style directives hurt. Linters are deterministic; LLM-enforced style rules are a negative-expected-value bet.
Hooks for “must happen,” rules for “should happen.” Community advice that CLAUDE.md is ~80% reliable while hooks are 100% aligns with ZORO’s 51% baseline — and with ZORO’s argument that passive text is an unreliable control plane.

Where the papers disagree with community folklore

Expert curation isn’t the value. Practitioners widely claim their hand-tuned CLAUDE.md is what makes Claude Code useful. Zhang et al.’s random-rule result challenges this: at the aggregate level, priming explains most of the lift, and carefully chosen positive directives can make things worse. The “onboarding document” framing (tell Claude WHAT / WHY / HOW) sounds reasonable but the WHY and the HOW-style-prescriptions section of most rule files are probably cargo-culted.
“Follow SOLID / DRY / clean code” is probably harmful. Scan any awesome-cursorrules collection and you’ll see dozens of rule files open with exactly this kind of positive directive. Zhang et al. measured those and they lost ground versus no rule at all.
AGENTS.md adoption momentum may be misplaced. OpenAI, Google, Sourcegraph, Factory, and Cursor jointly backed the AGENTS.md standard in 2025 and it now ships in 60k+ repos. Adoption is good for tool interop, but the standard says nothing about rule framing — and the example files distributed with the spec lean heavily on positive style directives (“Use functional patterns,” “TypeScript strict mode”). The standard needs a companion style guide informed by Zhang et al.

Evidence-Based Style Guide

Every recommendation below maps to a finding in one of the two papers. The formula is simple: constrain what the agent must not do; let hooks, CI, and linters enforce what it must do.

1. Rewrite positive directives as negative constraints

Before (positive directive, measured to hurt):

- Follow SOLID principles and write clean, idiomatic code
- Prefer functional patterns over imperative where possible
- Write comprehensive tests with meaningful assertions

After (negative constraint, measured to help):

- Do not introduce new inheritance chains deeper than one level;
  use composition
- Do not leave `// TODO` or commented-out code in files you touch
- Do not submit a change without a test that fails on main and
  passes on your branch

The negative version is more specific, testable, and gives the agent a bounded region to stay inside rather than a gradient to climb.

2. Move style rules out of the rules file entirely

Zhang et al. found positive style directives degrade performance. Community practice already says “never send an LLM to do a linter’s job.” Combine these: delete every “follow PEP 8 / use Prettier / prefer camelCase” line from your CLAUDE.md and install the linter with a pre-commit hook or an IDE format-on-save. The rule file only needs to mention the linter exists and tell the agent not to disable it.

Before:

- Use 4 spaces for indentation
- Use snake_case for function names
- Limit lines to 88 characters
- Use type annotations on every function signature

After:

- Do not modify `.ruff.toml`, `pyproject.toml [tool.ruff]`,
  or `.pre-commit-config.yaml` without calling this out
- Do not commit if `uv run ruff check` or `uv run pytest` fails;
  fix the failure instead

3. Keep the WHAT; prune the HOW-style

CLAUDE.md’s most valuable content is structural metadata the agent can’t trivially derive: where the bounded contexts live, which services call which, non-obvious naming conventions tied to domain terminology, and where utilities already exist so the agent doesn’t re-invent them. Keep that. Prune the “write idiomatic code” preamble.

4. Scope aggressively

The 25,532-rule corpus in Zhang et al. included a lot of rules that were irrelevant to the specific task an agent was asked to perform. With no-saturation-up-to-50 rules observed, length per se isn’t the risk — but irrelevance still wastes context budget. Prefer:

Repo-root AGENTS.md for universally-applicable constraints (≤50 bullets).
Per-directory AGENTS.md files in subdirectories for domain-specific constraints (the AGENTS.md spec supports nearest-wins resolution).
Separate docs in agent_docs/ linked by description, not inlined, so the agent reads them only on demand (HumanLayer’s progressive-disclosure pattern).

5. Make negative constraints specific and checkable

Vague negatives (“don’t be sloppy,” “don’t over-engineer”) are almost as bad as vague positives — the agent can’t tell when it has violated them. Good negative constraints name the artifact, the action, and the exception.

Weak: Do not make unnecessary changes. Strong: Do not modify files outside the directory the user named in the prompt unless the change is required by a failing test; if you do, list each file and justify it in your summary.

Active Rules: What ZORO Proposes

ZORO’s contribution is the mechanism, not the rule content. It has three phases worth understanding at a practitioner level because they suggest where rule-file tooling needs to go next.

Enrichment. After the agent produces a plan, ZORO pins each relevant rule to the step it governs and displays them inline in an interactive outline. Users can reorder, edit, and assign each rule an enforcement level: non-strict, strict, or testable. The key insight is that rules don’t apply globally — they apply at specific steps, and attaching them there makes both the agent and the user more likely to notice violations.

Enforcement. The agent cannot advance to the next step until all strict rules at the current step are proven satisfied. “Proof” is operationalized through two CLI commands, zoro update-step and zoro prove-rule. For testable rules, the agent runs a unit test and attaches the output; for strict rules, the agent submits a code artifact plus a plain-language summary of how the rule was satisfied. Evidence is displayed back to the user as enforcement artifacts, not buried in a scrollback log.

Evolution. When the user sees enforcement evidence they don’t like, they add in-situ notes directly on the rule in the UI. ZORO aggregates notes across the session, feeds them to an LLM to propose refined rule text, and lets the user accept, edit, or reject the proposal. This is where the 77.9% → 94.4% rule-compliance jump between Task 1 and Task 2 came from in the user study.

In concept, the active-rules pattern is what you’d get if you crossed a checklist UI with a deterministic hook and a continuous-improvement loop. Claude Code’s existing hook mechanism already handles piece two. What’s missing, and what ZORO demonstrates is worth building, is piece one (rule-to-step attachment with enforcement metadata) and piece three (in-situ rule refinement with proposed diffs).

The near-term retrofit any team can try tomorrow: add a /prove-rules slash command that reads the current AGENTS.md, asks the agent to produce a one-line compliance statement per rule the user marks strict, and fails the turn if any strict rule has no statement. That’s 80% of ZORO’s enforcement value without new infrastructure.

Practical Implications

If you maintain a CLAUDE.md or AGENTS.md file and you read one thing from the two papers, read this: delete your positive style directives this week and replace them with specific negative constraints plus a hook or pre-commit for anything deterministic.

Concrete playbook for the next sprint:

Audit the file. For each line, classify: negative constraint, positive directive, structural metadata, or style rule. Positive directives and style rules that duplicate a linter go on the chopping block.
Rewrite the survivors. Turn every should, always, prefer into a do not or an avoid with a named artifact. If you can’t formulate it as a negative, it probably belongs in a doc, not the rules file.
Install the hooks. Anything “must always happen” (format, lint, type-check, test) moves from rule text to pre-commit or harness hook. Hooks are deterministic and don’t burn context budget.
Link, don’t inline. Move long explainers (architecture doc, deployment runbook, migration guide) into agent_docs/ and leave a one-line Do not implement X without reading agent_docs/X.md first. pointer.
Instrument rule-following. Even without ZORO, add a lightweight self-report step: at the end of each PR, have the agent list which rules it applied and where. This builds the evidence base for your next audit and creates a cheap imitation of ZORO’s enforcement-evidence loop.
Accept the priming mechanism. Keep a short “this is a production codebase; changes must be minimal, tested, and reviewable” preamble. Zhang et al.’s random-rules result suggests that kind of priming is where the aggregate 7–14 pp lift comes from.

For teams maintaining a public AGENTS.md the standard’s example files probably lead you astray. Treat the sample in the spec as a placeholder, not a template.

Open Questions

Do the findings hold across models? Zhang et al. used one state-of-the-art coding agent. We don’t yet know whether the positive-directives-hurt finding is model-specific or universal. Replication on Claude, GPT-5-codex, and open-weights agents would be valuable.
Does the random-rule effect persist at scale? 679 files is a lot, but the rule-file ecosystem is both larger and more specialized by domain. Expert rules might beat random rules in narrow, in-distribution tasks — SWE-bench’s distribution is not your codebase’s distribution.
What’s the right enforcement granularity? ZORO’s strict / testable / non-strict trichotomy is a design choice, not a derivation. Is there a middle enforcement level (cheap static check vs. full unit test) that gets most of the benefit at a fraction of the cost?
How does this interact with skills and sub-agents? Claude Code skills and Anthropic’s Skills SDK layer new instruction surfaces on top of CLAUDE.md. The Zhang et al. study predates that stack. Do positive directives inside a skill file produce the same degradation as in CLAUDE.md?
Policy as code vs. policy as prose. The Columbia DAPLab’s framing — “treat policies as strict rules, not preferences” — points toward a future where rule files compile to deterministic checks rather than being shipped as prose to the model. The boundary between rule file and hook may dissolve.

Sources

Zhang, X., Wang, G., Cui, Y., Qiu, W., Li, Z., Zhu, B., He, P. (2026). Do Agent Rules Shape or Distort? Guardrails Beat Guidance in Coding Agents. arXiv:2604.11088. https://arxiv.org/abs/2604.11088
Ma, J., Wang, S., Kung, J. H., Chilton, L. B. (2026). ZORO: Active Rules for Reliable Vibe Coding. arXiv:2604.15625. https://arxiv.org/abs/2604.15625
Anthropic. Best Practices for Claude Code. https://code.claude.com/docs/en/best-practices
HumanLayer. Writing a good CLAUDE.md. https://www.humanlayer.dev/blog/writing-a-good-claude-md
Babich, N. (2026). CLAUDE.md Best Practices: 10 Sections to Include. UX Planet. https://uxplanet.org/claude-md-best-practices-1ef4f861ce7c
agentsmd/agents.md. AGENTS.md — a simple, open format for guiding coding agents. https://github.com/agentsmd/agents.md
AGENTS.md homepage. https://agents.md/
AGENTS.md Guide for OpenAI Codex. https://agentsmd.net/
Cursor Docs. Rules. https://docs.cursor.com/context/rules
PatrickJS/awesome-cursorrules. https://github.com/PatrickJS/awesome-cursorrules
Columbia DAPLab. Why Vibe Coding Fails and How to Fix It. https://daplab.cs.columbia.edu/general/2026/01/07/why-vibe-coding-fails-and-how-to-fix-it.html
zazencodes. Stop using AGENTS.md and CLAUDE.md (do this instead). https://zazencodes.substack.com/p/stop-using-agentsmd-and-claudemd
Dometrain. Creating the Perfect CLAUDE.md for Claude Code. https://dometrain.com/blog/creating-the-perfect-claudemd-for-claude-code/
Ng, A. Y., Harada, D., Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping — the PBRS foundation Zhang et al. build on.