About | The Artificer's Grimoire

The Artificer's Grimoire is a weekly intelligence feed on AI-assisted software development. It exists for a small, specific audience: practitioners and engineering leaders building, evaluating, or operating production agent systems.

It's also AI-generated, with a human in the loop. You should know that up front, and the rest of this page should explain how. If you read the Grimoire, you're probably someone who wants the architecture, not a disclaimer.

Who's behind it

I'm Tim Schiller, founder of Artificer Digital. The Grimoire is one of three Artificer properties. The others are the Artificer Digital company site and Artificer Crucible — a managed harness built to turn AI-generated code into production-ready software you can trust, with enterprise governance and human oversight built in. The Grimoire is partly a way to share what I'm learning in the same domain Crucible operates in. It's also a working demonstration of governance-conscious content infrastructure with a human in the loop. Crucible is the same problem at a different scale.

What we cover

Harness engineering — the architecture around the model: sandboxes, permission systems, observability, tool wiring, error recovery, and the question of whether the harness sits inside or outside the sandbox
Context engineering — how agents acquire, manage, and reason over project knowledge across long runs
Agent orchestration — multi-agent systems, task decomposition, coordination patterns
Agentic protocols — MCP, A2A, ACP, and emerging interoperability standards
Autonomous coding agents — Cursor, Copilot, Claude Code, Devin, and the evolving landscape
Governance and human-in-the-loop — approval patterns, safety boundaries, production guardrails
Spec-driven development — specifications as the contract between humans and agents

Items scoring 4+ are tagged Must Read with extended commentary. Items scoring 3 are Worth Scanning. The rubric prioritizes developments that directly affect people building agent infrastructure, not press-cycle noise.

How each edition is built

The pipeline runs in five stages:

Crawl pass. Automated scripts fetch from a curated source list — vendor blogs, arXiv, GitHub releases, practitioner publications, HN, and key newsletters — supplemented by LLM-driven web search for known structural gaps (some major sources don't publish RSS). Items are scored 1–5 against the domain priorities above. Runs weekly with mid-week supplements when a major story breaks.
Digest assembly. An LLM synthesizes the week's high-scoring items into the digest format: a Lead, Must Read items with extended commentary, Worth Scanning, New Tools & Repos, Papers, Ecosystem Watch, and a Long View essay.
Editorial review. A second LLM pass (/review) runs structural verification over the assembled digest: link liveness on every inline URL, quote fidelity (refetching every cited source to confirm direct quotes appear verbatim), and citation-target audit (refetching linked URLs to confirm the target page actually substantiates the specific claim it's attached to). Layered on top: causal-claim, numerical-claim, financial-reframing, structural-mechanics, and confidence-flattening audits, plus an advisory source-attribution pre-flag that surfaces claims attributed to a filing, announcement, or report ("the S-1 says…") for the reviewer to confirm against the source. Findings are fixed inline where unambiguous and escalated to a per-edition review report where they need human judgment.
Adversarial judge pass. A third LLM from a different model family runs a cross-model audit over the assembled digest — the field-standard LLM-as-a-judge pattern applied with an adversarial framing — working from the same six-failure-mode taxonomy as the in-pipeline review. The point is to surface failure modes the source model may have missed: a different model's blind spots and biases differ from the writer's and reviewer's, so claims that pass the in-pipeline review can still fail a cross-model audit. Findings flow into the human pass below as additional triage.
Human pre-publish pass. I read every edition before it ships, focus on the escalations flagged by the review and adversarial-judge passes, and edit where needed.

What this pipeline is good at: cross-source pattern recognition, connecting current developments to prior weeks' threads, surfacing themes no single source would name on its own, and catching the most common factual-drift failure modes — fabricated quotes, dead links, citation-target mismatches, unsourced precise figures — deterministically before they reach the page.

What it's weaker at, today:

Paraphrased structural claims. The verification pass confirms a quote is verbatim and a link is alive, but not whether a paraphrased mechanic — deal structure, regulatory dependency, corporate motive — matches the source's framing. Those get a closer human read.
Search-supplemented stories. When a story is filled in by search without fetching the article, the only ground truth is the URL and the snippet; anything past that is treated as inference, not reporting.
Paywalled or bot-blocked sources. Quotes that can't be refetched surface as "unverifiable," and the human pass decides whether they run.

These are edge cases, not the norm, and each routes to the human pass rather than slipping through. The automated checks keep expanding; naming today's limits is simply how you calibrate what to trust.

The failure modes the pipeline watches for

The review pass is organized around a named taxonomy of seven failure modes, each one drawn from a real incident in the Grimoire's own pipeline. Naming them lets the writer and reviewer passes share the same vocabulary, and lets each one have a documented detection-and-correction protocol:

Attribution Drift — paraphrased sentiment surfaces as a verbatim quote attributed to a publication or person the writer didn't actually quote from.
Causal Invention — "X caused Y" stated as factual reportage without any cited source attesting the causal link.
Financial Reframing — deal terms reworked into investor-narrative language not used in the cited sources (partnership payments rebranded as capital injections, options as commitments, etc.).
Structural Assumption — logical-but-unstated corporate, legal, or financial mechanics (IPO timing, deal financing structure, regulatory dependency, internal motive) filled in by inference and presented as fact.
Confidence Flattening — verified facts and editorial speculation rendered with equal certainty, so the reader can't tell which is which.
Citation-Target Drift — a hyperlinked claim where the URL is live and the source is in the source list, but the target page doesn't actually substantiate the specific claim in the link text.
Fourth-Wall Break — the writer pass surfaces pipeline shop talk into reader-facing prose: tool names like WebFetched, internal jargon like URL-slug-grounded, or other phrasings that disclose a sourcing limitation in pipeline-internal voice instead of in reader-facing journalist voice. A reliably amusing failure of AI-assisted editorial — the pipeline forgets the reader doesn't have access to the pipeline. Worth catching because the right fix is the same one the standard already requires elsewhere: translate the uncertainty into journalist voice ("reporting on X remains thin") rather than into shop talk about which tool didn't run.

The taxonomy is a living document — if an eighth failure mode shows up in practice, it gets named, added, and propagated into the writer and reviewer commands. That's how the discipline stays calibrated to real failures rather than to imagined ones.

Errata

If you spot something that looks wrong, tell me: grimoire@artificerdigital.com. Confirmed corrections are posted inline on the affected edition with a visible correction note and tracked in a public errata log. Reporters get attribution if they want it.

A note on the framing

AI-generated content with human review is increasingly common — what's worth knowing is how the human review is structured. The point of this page is to make that visible: a named failure-mode taxonomy, a verification pass that catches the common drift modes, and a public errata commitment for the ones it doesn't.

The discipline lives in the pipeline architecture, not in slogans. That's what the rest of Artificer Digital is building toward — the Grimoire is one demonstration, Crucible is another.

◆

— Tim