Artificer Digital The Artificer's Grimoire

About the Grimoire

The Artificer's Grimoire is a weekly intelligence feed on autonomous AI-assisted software development. It exists for a small, specific audience: practitioners and engineering leaders building, evaluating, or operating production agent systems.

It's also AI-generated, with a human in the loop. You should know that up front, and the rest of this page should explain how. If you read the Grimoire, you're probably someone who wants the architecture, not a disclaimer.

Who's behind it

I'm Tim Schiller, founder of Artificer Digital. The Grimoire is one of three Artificer properties. The others are Artificer Forge, an autonomous software development platform with enterprise governance built in, and the Artificer Digital company site. The Grimoire is partly a way to share what I'm learning in the same domain Forge operates in. It's also a working demonstration of governance-conscious autonomous content infrastructure. Forge is the same problem at a different scale.

What we cover

Items scoring 4+ are tagged Must Read with extended commentary. Items scoring 3 are Worth Scanning. The rubric prioritizes developments that directly affect people building agent infrastructure, not press-cycle noise.

How each edition is built

The pipeline runs in five stages:

  1. Crawl pass. Automated scripts fetch from a curated source list — vendor blogs, arXiv, GitHub releases, practitioner publications, HN, and key newsletters — supplemented by LLM-driven web search for known structural gaps (some major sources don't publish RSS). Items are scored 1–5 against the domain priorities above. Runs weekly with mid-week supplements when a major story breaks.
  2. Digest assembly. An LLM synthesizes the week's high-scoring items into the digest format: a Lead, Must Read items with extended commentary, Worth Scanning, New Tools & Repos, Papers, Ecosystem Watch, and a Long View essay.
  3. Editorial review. A second LLM pass (/review) runs structural verification over the assembled digest: link liveness on every inline URL, quote fidelity (refetching every cited source to confirm direct quotes appear verbatim), and citation-target audit (refetching linked URLs to confirm the target page actually substantiates the specific claim it's attached to). Layered on top: causal-claim, numerical-claim, financial-reframing, structural-mechanics, and confidence-flattening audits, plus an advisory source-attribution pre-flag that surfaces claims attributed to a filing, announcement, or report ("the S-1 says…") for the reviewer to confirm against the source. Findings are fixed inline where unambiguous and escalated to a per-edition review report where they need human judgment.
  4. Adversarial judge pass. A third LLM from a different model family runs a cross-model audit over the assembled digest — the field-standard LLM-as-a-judge pattern applied with an adversarial framing — working from the same six-failure-mode taxonomy as the in-pipeline review. The point is to surface failure modes the source model may have missed: a different model's blind spots and biases differ from the writer's and reviewer's, so claims that pass the in-pipeline review can still fail a cross-model audit. Findings flow into the human pass below as additional triage.
  5. Human pre-publish pass. I read every edition before it ships, focus on the escalations flagged by the review and adversarial-judge passes, and edit where needed.

What this pipeline is good at: cross-source pattern recognition, connecting current developments to prior weeks' threads, surfacing themes no single source would name on its own, and catching the most common factual-drift failure modes — fabricated quotes, dead links, citation-target mismatches, unsourced precise figures — deterministically before they reach the page.

What it isn't good at, today:

Closing these gaps is an ongoing set of pipeline additions. A first piece has shipped: claims attributed to a filing, announcement, or report ("the S-1 says…") are now machine pre-flagged as an advisory work-queue for the reviewer to confirm against the source — a deterministic catch for one frequent instance, though the broader judgment of whether a paraphrase matches a source's framing remains a human/LLM read. Still ahead: stricter source-fetching at supplement time, a paraphrase-aware citation-target check, and a wider verification surface for structural mechanics. Until those ship, the gaps above are named so readers can calibrate accordingly.

The failure modes the pipeline watches for

The review pass is organized around a named taxonomy of seven failure modes, each one drawn from a real incident in the Grimoire's own pipeline. Naming them lets the writer and reviewer passes share the same vocabulary, and lets each one have a documented detection-and-correction protocol:

  1. Attribution Drift — paraphrased sentiment surfaces as a verbatim quote attributed to a publication or person the writer didn't actually quote from.
  2. Causal Invention — "X caused Y" stated as factual reportage without any cited source attesting the causal link.
  3. Financial Reframing — deal terms reworked into investor-narrative language not used in the cited sources (partnership payments rebranded as capital injections, options as commitments, etc.).
  4. Structural Assumption — logical-but-unstated corporate, legal, or financial mechanics (IPO timing, deal financing structure, regulatory dependency, internal motive) filled in by inference and presented as fact.
  5. Confidence Flattening — verified facts and editorial speculation rendered with equal certainty, so the reader can't tell which is which.
  6. Citation-Target Drift — a hyperlinked claim where the URL is live and the source is in the source list, but the target page doesn't actually substantiate the specific claim in the link text.
  7. Fourth-Wall Break — the writer pass surfaces pipeline shop talk into reader-facing prose: tool names like WebFetched, internal jargon like URL-slug-grounded, or other phrasings that disclose a sourcing limitation in pipeline-internal voice instead of in reader-facing journalist voice. A reliably amusing failure of AI-assisted editorial — the pipeline forgets the reader doesn't have access to the pipeline. Worth catching because the right fix is the same one the standard already requires elsewhere: translate the uncertainty into journalist voice ("reporting on X remains thin") rather than into shop talk about which tool didn't run.

The taxonomy is a living document — if an eighth failure mode shows up in practice, it gets named, added, and propagated into the writer and reviewer commands. That's how the discipline stays calibrated to real failures rather than to imagined ones.

Errata

If you spot something that looks wrong, tell me: grimoire@artificerdigital.com. Confirmed corrections are posted inline on the affected edition with a visible correction note and tracked in a public errata log. Reporters get attribution if they want it.

A note on the framing

AI-generated content with human review is increasingly common — what's worth knowing is how the human review is structured. The point of this page is to make that visible: a named failure-mode taxonomy, a verification pass that catches the common drift modes, and a public errata commitment for the ones it doesn't.

The discipline lives in the pipeline architecture, not in slogans. That's what the rest of Artificer Digital is building toward — the Grimoire is one demonstration, Forge is another.

— Tim