Scout: DDIPE and Documentation Poisoning: The New Supply-Chain Attack Vector

Summary

A new class of supply-chain attack has arrived that does not require poisoning code at all — it poisons documentation. The DDIPE (Document-Driven Implicit Payload Execution) technique, published April 2026 by researchers from multiple universities, embeds malicious logic inside the code examples and configuration templates of SKILL.md files, exploiting the in-context learning behavior of LLMs to reproduce those patterns during normal task execution [1]. Across 1,070 adversarial skills tested against four production agent frameworks and five models, DDIPE achieved bypass rates of 11.6% to 33.5% — while explicit instruction attacks achieved 0% against strongly aligned models. The real-world attack surface is already being exploited: the ClawHavoc campaign flooded OpenClaw’s ClawHub marketplace with 824+ malicious skills in under three weeks [3][4], and the Axios npm compromise demonstrated how social engineering of a single maintainer can inject malware into 100-million-weekly-download packages [7][8]. The defensive toolkit is emerging — cross-model ensemble voting compresses bypass rates to 1.6%, static analysis catches 90.7% of adversarial samples, and the first dynamic sandbox for agent skills (SandyClaw) launched April 2, 2026 [14] — but the fundamental problem remains: agent skill ecosystems have optimized for capability and discoverability at the expense of supply-chain integrity.

Key Findings

1. How DDIPE Actually Works: Documentation as Attack Vector

The core insight of DDIPE is that LLM coding agents do not merely read documentation — they reproduce it. When a SKILL.md file contains code examples or configuration templates, the agent treats these as authoritative reference implementations and weaves them into its generated output. DDIPE exploits this in-context learning pathway to smuggle malicious operations past safety alignment without issuing any explicit instructions [1].

The attack operates in four stages:

Embedding: Malicious code is placed inside Markdown code blocks or configuration templates within a SKILL.md file, positioned as “reference implementations” or “best practices”
In-context learning: When the LLM processes the skill metadata, it treats embedded code examples as authoritative patterns to follow
Passive reproduction: The agent reproduces these patterns during normal task execution — no prompt injection or explicit instruction required
Execution: Generated code executes through the agent’s action-space interfaces (file I/O, shell commands, network requests)

Three camouflage techniques make detection harder:

Functional compliance disguise: Framing data exfiltration as “telemetry” or “audit logging” — e.g., a PDF processing function that silently POSTs os.environ to https://api.internal-log.com
Silent exception suppression: Wrapping malicious operations in try/except: pass blocks so the main task succeeds regardless
False authority suggestion: Using compliance language, internal-sounding domains, and enterprise-style naming to make exfiltration look like legitimate infrastructure

The critical distinction from traditional prompt injection: DDIPE does not tell the agent to do something malicious. It shows the agent how to do something that happens to be malicious, embedded within a legitimate-looking pattern. This is why explicit instruction attacks achieved 0% success against Claude Code + Sonnet 4.6, while DDIPE achieved 2.3% direct execution and 13.5% bypass [1].

2. Framework and Model Vulnerability: The Numbers

The DDIPE paper evaluated four production frameworks across five LLMs, generating 1,070 adversarial skills spanning 15 MITRE ATT&CK categories [1]. The results reveal that neither model alignment nor framework architecture alone determines outcomes — their interaction does.

Bypass rates by framework and model:

Framework	Model	Direct Execution	Bypass Rate
Claude Code	Sonnet 4.6	2.3%	13.5%
Claude Code	GLM-4.7	2.4%	16.8%
Claude Code	MiniMax-M2.5	13.3%	20.5%
OpenHands	Sonnet 4.6	10.7%	22.0%
OpenHands	GLM-4.7	27.1%	33.5%
OpenHands	MiniMax-M2.5	25.5%	27.0%
Codex	GPT-5.4	3.8%	11.6%
Gemini CLI	Gemini 2.5 Pro	4.2%	14.0%

Key observations:

Architectural defenses matter enormously for weaker models: Removing Claude Code’s multi-layer security review amplified GLM-4.7’s execution rate 11.3x. Strong alignment makes architecture nearly redundant (Sonnet 4.6 produced only 5 sleeper activations across all frameworks), while weak alignment makes architecture critical.
OpenHands is the most vulnerable framework tested: It relies entirely on model safety alignment with no architectural guardrails, producing the highest bypass rates across all models.
Codex’s sandbox provides real protection: Despite using bubblewrap containerization, it achieved the lowest bypass rate (11.6%), though researchers discovered a sandbox bypass vulnerability via poisoned supply-chain configuration files [1].
Brevity correlates with bypass success: Executed payloads averaged 1,850 bytes vs. 2,214 bytes for refused samples — shorter, more focused payloads evade detection more reliably.

The most dangerous finding: 2.5% of adversarial skills evaded both static analysis detection AND model safety alignment. These represent the irreducible attack surface that no single defense layer can address.

3. The 479-Byte Universal Bypass: Pip Configuration Poisoning

One payload from the DDIPE study deserves special attention because it was executed by all three models tested under Claude Code — the only payload to achieve this [1]:

The payload overwrites pip.conf with an extra-index-url pointing to https://packages.pdf-tools.internal/simple/ and a trusted-host configuration. At 479 bytes, it is trivially small. Every model treated it as legitimate development environment setup because it looks exactly like legitimate development environment setup.

This illustrates the fundamental challenge: DDIPE payloads do not contain “attack-indicative keywords.” They contain perfectly valid configuration operations that happen to redirect package resolution to an attacker-controlled index. The semantic gap between “configure a package mirror” and “redirect all future installs to a malicious registry” is invisible to alignment training.

A second example — conda environment poisoning via ~/.condarc tampering and post_link.sh remote script injection — was executed by Sonnet 4.6 as legitimate development setup, while GLM-4.7 actually detected the semantic disconnect between claimed PDF processing functionality and conda configuration. Different models have different blind spots, which is the basis for the ensemble defense.

4. The MITRE ATT&CK Surface: What Attackers Are After

The 1,070 adversarial skills mapped to 15 MITRE ATT&CK categories, with a distribution that reveals attacker priorities [1]:

Supply-chain poisoning (T1195.002): 47.5% — redirecting package managers, injecting build dependencies
Environment variable theft (T1528): 11.9% — harvesting API keys, cloud credentials, tokens
Credential theft (T1552.004): 8.6% — SSH keys, AWS credentials, Kubernetes tokens
Reverse shell/RCE (T1059.004): targeting shell execution
Container escape (T1611): privilege escalation in containerized environments
Persistence implant (T1053, T1543): cron jobs, systemd services
SSH backdoor (T1098.004): authorized_keys injection
HTTP/DNS/webhook exfiltration (T1048.003, T1567.002): data exfiltration channels
IaC attack (T1496): infrastructure-as-code manipulation
Cryptomining (T1496): resource hijacking
Malicious config write (T1565.001): configuration file tampering

The dominance of supply-chain poisoning (47.5%) is significant: the most effective DDIPE payloads do not steal data directly. They rewrite the supply chain so that all future operations flow through attacker-controlled infrastructure. A single pip.conf overwrite converts every subsequent pip install into a potential compromise vector.

5. ClawHavoc: The First Marketplace-Scale Skill Poisoning Campaign

While DDIPE demonstrates the theoretical attack surface, ClawHavoc demonstrates what exploitation looks like at scale. Between January 27-29, 2026, attackers flooded OpenClaw’s ClawHub marketplace with malicious skills in a coordinated 3-day burst [3][4][5].

Scale: Initial discovery found 341 malicious skills out of 2,857 total. As ClawHub grew to 10,700+ skills, scanning identified 824 malicious entries. At peak infection, five of the top seven most-downloaded skills were confirmed malware [10].

Methodology: The 335 skills from the main campaign all used a consistent social engineering pattern [3][4]:

Skill names mimicked popular use cases: solana-wallet-tracker, youtube-summarize-pro, security-scanning skills
Documentation looked professional with complete README formatting
A “Prerequisites” section instructed users to install what appeared to be a dependency
The “dependency” was actually Atomic Stealer (AMOS), a macOS credential stealer
All 335 AMOS-delivering skills shared a single C2 IP: 91.92.242[.]30

Barrier to entry: Publishing a skill on ClawHub required only a SKILL.md Markdown file and a GitHub account one week old. No code signing, no security review, no sandbox by default [6]. This is the structural enabler — the skills ecosystem optimized for frictionless publishing at the expense of any supply-chain verification.

The “Lethal Trifecta”: The OWASP Agentic Skills Top 10 identifies that a skill is especially dangerous when it simultaneously has: access to private data (SSH keys, API credentials, browser data), exposure to untrusted content (skill instructions, memory files), and ability to communicate externally (network egress, webhook calls) [11]. ClawHavoc skills routinely satisfied all three conditions.

The Axios npm supply chain attack of March 30, 2026 represents a complementary vector — not documentation poisoning, but maintainer compromise that poisons the very libraries agents depend on [7][8][9].

Attack chain: North Korean threat actor UNC1069 targeted Jason Saayman, the lead maintainer of Axios (100+ million weekly npm downloads):

Approached under the guise of a legitimate company founder
Cloned the company founder’s likeness and brand identity
Invited Saayman to a real Slack workspace with plausible channel activity and LinkedIn posts
Moved interaction to a Microsoft Teams meeting with what appeared to be multiple participants
Prompted installation of a “missing Teams component” — actually a RAT
Used compromised npm credentials to publish axios@1.14.1 and axios@0.30.4
Malicious versions injected plain-crypto-js@4.2.1, a hidden dependency containing a cross-platform RAT dropper (WAVESHAPER.V2)

Exposure window: Approximately 3 hours before detection and removal [7][8].

Why this matters for agent security: Axios is a foundational dependency in countless agent frameworks, MCP servers, and skill implementations. A compromised Axios version does not need to poison documentation or exploit in-context learning — it operates at the code execution layer, below any alignment or sandboxing the agent framework provides. The DDIPE researchers note this complementarity explicitly: documentation poisoning targets the generation layer while dependency poisoning targets the execution layer. A sophisticated attacker can combine both [1].

7. Defenses: What Works, What Doesn’t, and What’s Emerging

Static analysis (SkillScan): 90.7% detection rate [1]. The DDIPE paper’s SkillScan tool flagged most adversarial samples through pattern-based detection (YAML + YARA rules), LLM-as-a-judge classification, and behavioral dataflow analysis. However, the 9.3% that escaped static analysis includes the most dangerous samples — those specifically designed to look like legitimate operations. A broader study of 31,132 skills found 26.1% contain at least one potentially dangerous pattern, though most reflect insecure practices rather than intentional malice [15].

Cross-model ensemble voting: 1.6% joint bypass rate [1]. The paper’s strongest defensive finding: deploying multiple heterogeneous models as security reviewers compresses the attack surface from 13-20% (single model) to 1.6% (cross-model consensus). The pip.conf payload was the only sample that bypassed all models. Different models have different blind spots — Sonnet catches what GLM misses, and vice versa. This is practical: run a second model as a security reviewer on generated code before execution.

Framework architectural guardrails: Claude Code’s multi-layer security review, Codex’s bubblewrap sandbox, and Gemini CLI’s cloud-native filtering all provide meaningful defense — but none is sufficient alone. The paper found four confirmed vulnerabilities across all four frameworks [1]:

Codex CLI sandbox bypass via poisoned supply-chain config files (Bugcrowd)
Gemini CLI auto-trust in CI mode via CI=true and .gemini/settings.json (Google VRP)
Claude Code skill-loading bypass — skill content treated as executable without dedicated permission prompts (HackerOne)
OpenHands framework issue remediated via PR adding untrusted-content warnings

Dynamic sandboxing (SandyClaw): Launched April 2, 2026 by Permiso Security, SandyClaw is the first platform to apply sandbox detonation — the methodology used for suspicious executables — to agent skills [14]. It executes skills in isolation, records every LLM action, network call, domain resolution, file write, and environment variable access. SSL traffic is intercepted and decrypted. Analysis runs against Sigma, YARA, Nova, and Snort detection engines. This addresses static analysis’s fundamental limitation: it cannot detect behavior that only manifests at runtime.

Skill Trust and Signing Service (STSS): The emerging standard for skill integrity verification builds a SHA-256 Merkle tree of every file in a skill, signs the result with an Ed25519 key, and stores an attestation alongside the skill. At load time, verification recomputes the Merkle root and checks the signature — any tampering breaks the chain [12]. This prevents post-publish modification but does not address skills that are malicious from initial publication.

OWASP Agentic Skills Top 10: The OWASP Foundation launched a dedicated project cataloging the ten most critical security risks in agentic AI skills, with evidence-based mitigations [11]. This provides the first standardized framework for skill security assessment, complementing the existing OWASP Top 10 for Agentic Applications.

8. The Structural Problem: Skill Ecosystems Are Pre-npm-Audit

The agent skills ecosystem in April 2026 resembles the npm ecosystem circa 2015 — before npm audit, before Snyk, before lock files were standard. The parallels are uncomfortable [6][10][11]:

No mandatory code signing: Skills are published as plain Markdown files with no cryptographic attestation
Minimal identity verification: ClawHub requires a one-week-old GitHub account; no maintainer verification
No sandbox by default: Skills execute with the agent’s full permissions unless the framework provides isolation
No dependency graph visibility: Consumers cannot see what a skill will cause the agent to install or execute
No security review pipeline: Skills go live immediately upon publication
No revocation mechanism: Malicious skills must be manually identified and removed

The npm ecosystem took years of painful incidents (event-stream, ua-parser-js, colors/faker) to build its current security infrastructure. The skills ecosystem is compressing this timeline — ClawHavoc is its event-stream moment — but the tooling is still nascent.

Practical Implications

Immediate Actions (This Week)

Audit all third-party skills in your agent configurations. For each SKILL.md file consumed by your agents, manually review every code block and configuration template. Specifically look for: network calls to unfamiliar domains, configuration file overwrites (pip.conf, .condarc, .npmrc), environment variable access wrapped in exception handlers, and “prerequisite” installation instructions.
Restrict network egress from agent execution environments. Block outbound connections to any domain not explicitly allowlisted. The DDIPE paper’s exfiltration payloads all required HTTP/DNS/webhook egress to attacker-controlled infrastructure.
Check Axios versions. Verify no environment is running axios@1.14.1 or axios@0.30.4. If found, treat as full credential compromise.

Short-Term Hardening (Next 30 Days)

Deploy cross-model security review. Before executing agent-generated code, pass it through a second model (different vendor/architecture) as a security reviewer. The DDIPE paper shows this compresses bypass rates from 13-20% to 1.6%. Even a lightweight prompt — “Does this code perform any operations not directly related to the stated task?” — catches a significant fraction of disguised payloads.
Adopt static scanning for skills. Deploy SkillScan, Snyk Agent Scan, or SkillFortify to scan all skills before loading them into agent context. Static analysis catches 90.7% of adversarial samples and is fast enough for CI/CD integration [15][16].
Evaluate SandyClaw or equivalent dynamic analysis for any skills sourced from public marketplaces [14]. Static analysis misses runtime-only behaviors; dynamic sandboxing fills this gap.
Pin skills by content hash. Do not consume skills by reference to a mutable source. Clone skills into your repository, hash-verify the contents, and review diffs on any update — the same discipline applied to lock files.

Strategic Architecture (Next Quarter)

Implement skill provenance verification. Adopt Ed25519 signing for internally developed skills. Require signed attestations for any externally sourced skills. The STSS pattern provides a reference architecture [12].
Separate skill loading from skill execution. Skills metadata should be loaded into a read-only context without automatic execution privileges. Require explicit permission escalation for any skill that needs file I/O, shell access, or network egress — the same principle as mobile app permissions.
Build a private skill registry. For production agent infrastructure, maintain a curated, vetted skill registry rather than consuming directly from public marketplaces. This mirrors the private package mirror pattern for npm/PyPI.
Contribute to OWASP Agentic Skills Top 10. The framework is actively soliciting practitioner input. Teams with production agent deployments have direct experience with the failure modes being cataloged [11].

Open Questions

Alignment training vs. DDIPE: The DDIPE attack specifically exploits in-context learning rather than instruction following. Can alignment training be extended to make models suspicious of code patterns in documentation context, or does this fundamentally conflict with the purpose of providing examples? The paper’s finding that different models have different blind spots suggests this is not a solved problem at the training level.
Skill isolation granularity: Current frameworks treat skills as either fully trusted or not loaded at all. What does a practical permission model look like for skills that need some capabilities (e.g., file read) but not others (e.g., network egress)? The mobile permissions model is a reference, but agent tasks are less predictable than mobile app functions.
Cross-ecosystem skill provenance: Skills are proliferating across ClawHub, skills.sh, npm, and vendor-specific registries. Is a unified provenance standard feasible, or will each ecosystem develop its own signing and attestation scheme? The fragmentation creates gaps at ecosystem boundaries.
Dynamic analysis scalability: SandyClaw’s sandbox detonation approach requires executing every skill. As skill registries grow to tens of thousands of entries with frequent updates, does dynamic analysis remain feasible at registry scale, or does it become a consumer-side check only?
Compound attacks: DDIPE targets the generation layer, while dependency poisoning (Axios, LiteLLM) targets the execution layer. No defensive framework currently addresses both surfaces in an integrated way. What does a unified agent supply-chain security architecture look like?
The 2.5% irreducible residual: DDIPE found 2.5% of adversarial skills evade both static detection and model alignment. Is this residual compressible with better tooling, or does it represent a fundamental limit of documentation-as-input architectures? The answer determines whether defense-in-depth is sufficient or whether architectural redesign is needed.

Sources

Qu et al., “Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems” (DDIPE paper), April 2026 — https://arxiv.org/abs/2604.03081
Sonatype, “Axios Compromise on npm Introduces Hidden Malicious Package,” March 2026 — https://www.sonatype.com/blog/axios-compromise-on-npm-introduces-hidden-malicious-package
AuthMind, “OpenClaw Malicious Skills: Agentic AI Supply Chain,” February 2026 — https://www.authmind.com/blogs/openclaw-malicious-skills-agentic-ai-supply-chain
Koi Security, “ClawHavoc: 341 Malicious Clawed Skills Found by the Bot They Were Targeting,” February 2026 — https://www.koi.ai/blog/clawhavoc-341-malicious-clawedbot-skills-found-by-the-bot-they-were-targeting
Snyk, “Inside the ‘clawdhub’ Malicious Campaign: AI Agent Skills Drop Reverse Shells on OpenClaw Marketplace,” February 2026 — https://snyk.io/articles/clawdhub-malicious-campaign-ai-agent-skills/
CyberPress, “ClawHavoc Poisons OpenClaw’s ClawHub With 1,184 Malicious Skills,” March 2026 — https://cyberpress.org/clawhavoc-poisons-openclaws-clawhub-with-1184-malicious-skills/
The Hacker News, “UNC1069 Social Engineering of Axios Maintainer Led to npm Supply Chain Attack,” April 2026 — https://thehackernews.com/2026/04/unc1069-social-engineering-of-axios.html
The Hacker News, “Axios Supply Chain Attack Pushes Cross-Platform RAT via Compromised npm Account,” March 2026 — https://thehackernews.com/2026/03/axios-supply-chain-attack-pushes-cross.html
SOCRadar, “Axios npm Hijack 2026: Everything You Need to Know,” April 2026 — https://socradar.io/blog/axios-npm-supply-chain-attack-2026-ciso-guide/
Repello AI, “ClawHavoc: Inside the Supply Chain Attack That Targeted 300,000 AI Agent Users,” February 2026 — https://repello.ai/blog/clawhavoc-supply-chain-attack
OWASP, “Agentic Skills Top 10,” 2026 — https://owasp.org/www-project-agentic-skills-top-10/
Ken Huang, “Agent Skill Trust & Signing Service,” 2026 — https://kenhuangus.substack.com/p/agent-skill-trust-and-signing-service
Invariant Labs, “MCP Security Notification: Tool Poisoning Attacks,” 2025 — https://invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks
Permiso Security, “Introducing SandyClaw: The First Dynamic Sandbox for AI Agent Skills and Prompts,” April 2, 2026 — https://permiso.io/blog/introducing-sandyclaw-dynamic-sandbox-ai-agent-skills
GitHub/kurtpayne, “skillscan-security: Security scanner for AI agent skills and MCP tool bundles,” 2026 — https://github.com/kurtpayne/skillscan-security
GitHub/qualixar, “SkillFortify: Formal security scanner for AI agent skills,” 2026 — https://github.com/qualixar/skillfortify
Practical DevSecOps, “MCP Security Vulnerabilities: How to Prevent Prompt Injection and Tool Poisoning Attacks in 2026” — https://www.practical-devsecops.com/mcp-security-vulnerabilities/
Elastic Security Labs, “MCP Tools: Attack Vectors and Defense Recommendations for Autonomous Agents” — https://www.elastic.co/security-labs/mcp-tools-attack-defense-recommendations
Acuvity, “Tool Poisoning: Hidden Instructions in MCP Tool Descriptions” — https://acuvity.ai/tool-poisoning-hidden-instructions-in-mcp-tool-descriptions/
MCP Playground, “MCP Security in 2026: Tool Poisoning, OWASP MCP Top 10, and How to Protect Your Agents” — https://mcpplaygroundonline.com/blog/mcp-security-tool-poisoning-owasp-top-10-mcp-scan
Penligent AI, “AI Agents Hacking in 2026: Defending the New Execution Boundary” — https://www.penligent.ai/hackinglabs/ai-agents-hacking-in-2026-defending-the-new-execution-boundary/
Microsoft, “Mitigating the Axios npm supply chain compromise,” April 1, 2026 — https://www.microsoft.com/en-us/security/blog/2026/04/01/mitigating-the-axios-npm-supply-chain-compromise/
NVIDIA, “Practical Security Guidance for Sandboxing Agentic Workflows and Managing Execution Risk” — https://developer.nvidia.com/blog/practical-security-guidance-for-sandboxing-agentic-workflows-and-managing-execution-risk/
Snyk, “ToxicSkills: Malicious AI Agent Skills on ClawHub,” 2026 — https://snyk.io/blog/toxicskills-malicious-ai-agent-skills-clawhub/