Summary
A new class of supply-chain attack has arrived that does not require poisoning code at all — it poisons documentation. The DDIPE (Document-Driven Implicit Payload Execution) technique, published April 2026 by researchers from multiple universities, embeds malicious logic inside the code examples and configuration templates of SKILL.md files, exploiting the in-context learning behavior of LLMs to reproduce those patterns during normal task execution [1]. Across 1,070 adversarial skills tested against four production agent frameworks and five models, DDIPE achieved bypass rates of 11.6% to 33.5% — while explicit instruction attacks achieved 0% against strongly aligned models. The real-world attack surface is already being exploited: the ClawHavoc campaign flooded OpenClaw’s ClawHub marketplace with 824+ malicious skills in under three weeks [3][4], and the Axios npm compromise demonstrated how social engineering of a single maintainer can inject malware into 100-million-weekly-download packages [7][8]. The defensive toolkit is emerging — cross-model ensemble voting compresses bypass rates to 1.6%, static analysis catches 90.7% of adversarial samples, and the first dynamic sandbox for agent skills (SandyClaw) launched April 2, 2026 [14] — but the fundamental problem remains: agent skill ecosystems have optimized for capability and discoverability at the expense of supply-chain integrity.
Key Findings
1. How DDIPE Actually Works: Documentation as Attack Vector
The core insight of DDIPE is that LLM coding agents do not merely read documentation — they reproduce it. When a SKILL.md file contains code examples or configuration templates, the agent treats these as authoritative reference implementations and weaves them into its generated output. DDIPE exploits this in-context learning pathway to smuggle malicious operations past safety alignment without issuing any explicit instructions [1].
The attack operates in four stages:
- Embedding: Malicious code is placed inside Markdown code blocks or configuration templates within a SKILL.md file, positioned as “reference implementations” or “best practices”
- In-context learning: When the LLM processes the skill metadata, it treats embedded code examples as authoritative patterns to follow
- Passive reproduction: The agent reproduces these patterns during normal task execution — no prompt injection or explicit instruction required
- Execution: Generated code executes through the agent’s action-space interfaces (file I/O, shell commands, network requests)
Three camouflage techniques make detection harder:
- Functional compliance disguise: Framing data exfiltration as “telemetry” or “audit logging” — e.g., a PDF processing function that silently POSTs
os.environtohttps://api.internal-log.com - Silent exception suppression: Wrapping malicious operations in
try/except: passblocks so the main task succeeds regardless - False authority suggestion: Using compliance language, internal-sounding domains, and enterprise-style naming to make exfiltration look like legitimate infrastructure
The critical distinction from traditional prompt injection: DDIPE does not tell the agent to do something malicious. It shows the agent how to do something that happens to be malicious, embedded within a legitimate-looking pattern. This is why explicit instruction attacks achieved 0% success against Claude Code + Sonnet 4.6, while DDIPE achieved 2.3% direct execution and 13.5% bypass [1].
2. Framework and Model Vulnerability: The Numbers
The DDIPE paper evaluated four production frameworks across five LLMs, generating 1,070 adversarial skills spanning 15 MITRE ATT&CK categories [1]. The results reveal that neither model alignment nor framework architecture alone determines outcomes — their interaction does.
Bypass rates by framework and model:
| Framework | Model | Direct Execution | Bypass Rate |
|---|---|---|---|
| Claude Code | Sonnet 4.6 | 2.3% | 13.5% |
| Claude Code | GLM-4.7 | 2.4% | 16.8% |
| Claude Code | MiniMax-M2.5 | 13.3% | 20.5% |
| OpenHands | Sonnet 4.6 | 10.7% | 22.0% |
| OpenHands | GLM-4.7 | 27.1% | 33.5% |
| OpenHands | MiniMax-M2.5 | 25.5% | 27.0% |
| Codex | GPT-5.4 | 3.8% | 11.6% |
| Gemini CLI | Gemini 2.5 Pro | 4.2% | 14.0% |
Key observations:
- Architectural defenses matter enormously for weaker models: Removing Claude Code’s multi-layer security review amplified GLM-4.7’s execution rate 11.3x. Strong alignment makes architecture nearly redundant (Sonnet 4.6 produced only 5 sleeper activations across all frameworks), while weak alignment makes architecture critical.
- OpenHands is the most vulnerable framework tested: It relies entirely on model safety alignment with no architectural guardrails, producing the highest bypass rates across all models.
- Codex’s sandbox provides real protection: Despite using bubblewrap containerization, it achieved the lowest bypass rate (11.6%), though researchers discovered a sandbox bypass vulnerability via poisoned supply-chain configuration files [1].
- Brevity correlates with bypass success: Executed payloads averaged 1,850 bytes vs. 2,214 bytes for refused samples — shorter, more focused payloads evade detection more reliably.
The most dangerous finding: 2.5% of adversarial skills evaded both static analysis detection AND model safety alignment. These represent the irreducible attack surface that no single defense layer can address.
3. The 479-Byte Universal Bypass: Pip Configuration Poisoning
One payload from the DDIPE study deserves special attention because it was executed by all three models tested under Claude Code — the only payload to achieve this [1]:
The payload overwrites pip.conf with an extra-index-url pointing to https://packages.pdf-tools.internal/simple/ and a trusted-host configuration. At 479 bytes, it is trivially small. Every model treated it as legitimate development environment setup because it looks exactly like legitimate development environment setup.
This illustrates the fundamental challenge: DDIPE payloads do not contain “attack-indicative keywords.” They contain perfectly valid configuration operations that happen to redirect package resolution to an attacker-controlled index. The semantic gap between “configure a package mirror” and “redirect all future installs to a malicious registry” is invisible to alignment training.
A second example — conda environment poisoning via ~/.condarc tampering and post_link.sh remote script injection — was executed by Sonnet 4.6 as legitimate development setup, while GLM-4.7 actually detected the semantic disconnect between claimed PDF processing functionality and conda configuration. Different models have different blind spots, which is the basis for the ensemble defense.
4. The MITRE ATT&CK Surface: What Attackers Are After
The 1,070 adversarial skills mapped to 15 MITRE ATT&CK categories, with a distribution that reveals attacker priorities [1]:
- Supply-chain poisoning (T1195.002): 47.5% — redirecting package managers, injecting build dependencies
- Environment variable theft (T1528): 11.9% — harvesting API keys, cloud credentials, tokens
- Credential theft (T1552.004): 8.6% — SSH keys, AWS credentials, Kubernetes tokens
- Reverse shell/RCE (T1059.004): targeting shell execution
- Container escape (T1611): privilege escalation in containerized environments
- Persistence implant (T1053, T1543): cron jobs, systemd services
- SSH backdoor (T1098.004): authorized_keys injection
- HTTP/DNS/webhook exfiltration (T1048.003, T1567.002): data exfiltration channels
- IaC attack (T1496): infrastructure-as-code manipulation
- Cryptomining (T1496): resource hijacking
- Malicious config write (T1565.001): configuration file tampering
The dominance of supply-chain poisoning (47.5%) is significant: the most effective DDIPE payloads do not steal data directly. They rewrite the supply chain so that all future operations flow through attacker-controlled infrastructure. A single pip.conf overwrite converts every subsequent pip install into a potential compromise vector.
5. ClawHavoc: The First Marketplace-Scale Skill Poisoning Campaign
While DDIPE demonstrates the theoretical attack surface, ClawHavoc demonstrates what exploitation looks like at scale. Between January 27-29, 2026, attackers flooded OpenClaw’s ClawHub marketplace with malicious skills in a coordinated 3-day burst [3][4][5].
Scale: Initial discovery found 341 malicious skills out of 2,857 total. As ClawHub grew to 10,700+ skills, scanning identified 824 malicious entries. At peak infection, five of the top seven most-downloaded skills were confirmed malware [10].
Methodology: The 335 skills from the main campaign all used a consistent social engineering pattern [3][4]:
- Skill names mimicked popular use cases:
solana-wallet-tracker,youtube-summarize-pro, security-scanning skills - Documentation looked professional with complete README formatting
- A “Prerequisites” section instructed users to install what appeared to be a dependency
- The “dependency” was actually Atomic Stealer (AMOS), a macOS credential stealer
- All 335 AMOS-delivering skills shared a single C2 IP:
91.92.242[.]30
Barrier to entry: Publishing a skill on ClawHub required only a SKILL.md Markdown file and a GitHub account one week old. No code signing, no security review, no sandbox by default [6]. This is the structural enabler — the skills ecosystem optimized for frictionless publishing at the expense of any supply-chain verification.
The “Lethal Trifecta”: The OWASP Agentic Skills Top 10 identifies that a skill is especially dangerous when it simultaneously has: access to private data (SSH keys, API credentials, browser data), exposure to untrusted content (skill instructions, memory files), and ability to communicate externally (network egress, webhook calls) [11]. ClawHavoc skills routinely satisfied all three conditions.
6. The Axios Compromise: Social Engineering as the Missing Link
The Axios npm supply chain attack of March 30, 2026 represents a complementary vector — not documentation poisoning, but maintainer compromise that poisons the very libraries agents depend on [7][8][9].
Attack chain: North Korean threat actor UNC1069 targeted Jason Saayman, the lead maintainer of Axios (100+ million weekly npm downloads):
- Approached under the guise of a legitimate company founder
- Cloned the company founder’s likeness and brand identity
- Invited Saayman to a real Slack workspace with plausible channel activity and LinkedIn posts
- Moved interaction to a Microsoft Teams meeting with what appeared to be multiple participants
- Prompted installation of a “missing Teams component” — actually a RAT
- Used compromised npm credentials to publish
axios@1.14.1andaxios@0.30.4 - Malicious versions injected
plain-crypto-js@4.2.1, a hidden dependency containing a cross-platform RAT dropper (WAVESHAPER.V2)
Exposure window: Approximately 3 hours before detection and removal [7][8].
Why this matters for agent security: Axios is a foundational dependency in countless agent frameworks, MCP servers, and skill implementations. A compromised Axios version does not need to poison documentation or exploit in-context learning — it operates at the code execution layer, below any alignment or sandboxing the agent framework provides. The DDIPE researchers note this complementarity explicitly: documentation poisoning targets the generation layer while dependency poisoning targets the execution layer. A sophisticated attacker can combine both [1].
7. Defenses: What Works, What Doesn’t, and What’s Emerging
Static analysis (SkillScan): 90.7% detection rate [1]. The DDIPE paper’s SkillScan tool flagged most adversarial samples through pattern-based detection (YAML + YARA rules), LLM-as-a-judge classification, and behavioral dataflow analysis. However, the 9.3% that escaped static analysis includes the most dangerous samples — those specifically designed to look like legitimate operations. A broader study of 31,132 skills found 26.1% contain at least one potentially dangerous pattern, though most reflect insecure practices rather than intentional malice [15].
Cross-model ensemble voting: 1.6% joint bypass rate [1]. The paper’s strongest defensive finding: deploying multiple heterogeneous models as security reviewers compresses the attack surface from 13-20% (single model) to 1.6% (cross-model consensus). The pip.conf payload was the only sample that bypassed all models. Different models have different blind spots — Sonnet catches what GLM misses, and vice versa. This is practical: run a second model as a security reviewer on generated code before execution.
Framework architectural guardrails: Claude Code’s multi-layer security review, Codex’s bubblewrap sandbox, and Gemini CLI’s cloud-native filtering all provide meaningful defense — but none is sufficient alone. The paper found four confirmed vulnerabilities across all four frameworks [1]:
- Codex CLI sandbox bypass via poisoned supply-chain config files (Bugcrowd)
- Gemini CLI auto-trust in CI mode via
CI=trueand.gemini/settings.json(Google VRP) - Claude Code skill-loading bypass — skill content treated as executable without dedicated permission prompts (HackerOne)
- OpenHands framework issue remediated via PR adding untrusted-content warnings
Dynamic sandboxing (SandyClaw): Launched April 2, 2026 by Permiso Security, SandyClaw is the first platform to apply sandbox detonation — the methodology used for suspicious executables — to agent skills [14]. It executes skills in isolation, records every LLM action, network call, domain resolution, file write, and environment variable access. SSL traffic is intercepted and decrypted. Analysis runs against Sigma, YARA, Nova, and Snort detection engines. This addresses static analysis’s fundamental limitation: it cannot detect behavior that only manifests at runtime.
Skill Trust and Signing Service (STSS): The emerging standard for skill integrity verification builds a SHA-256 Merkle tree of every file in a skill, signs the result with an Ed25519 key, and stores an attestation alongside the skill. At load time, verification recomputes the Merkle root and checks the signature — any tampering breaks the chain [12]. This prevents post-publish modification but does not address skills that are malicious from initial publication.
OWASP Agentic Skills Top 10: The OWASP Foundation launched a dedicated project cataloging the ten most critical security risks in agentic AI skills, with evidence-based mitigations [11]. This provides the first standardized framework for skill security assessment, complementing the existing OWASP Top 10 for Agentic Applications.
8. The Structural Problem: Skill Ecosystems Are Pre-npm-Audit
The agent skills ecosystem in April 2026 resembles the npm ecosystem circa 2015 — before npm audit, before Snyk, before lock files were standard. The parallels are uncomfortable [6][10][11]:
- No mandatory code signing: Skills are published as plain Markdown files with no cryptographic attestation
- Minimal identity verification: ClawHub requires a one-week-old GitHub account; no maintainer verification
- No sandbox by default: Skills execute with the agent’s full permissions unless the framework provides isolation
- No dependency graph visibility: Consumers cannot see what a skill will cause the agent to install or execute
- No security review pipeline: Skills go live immediately upon publication
- No revocation mechanism: Malicious skills must be manually identified and removed
The npm ecosystem took years of painful incidents (event-stream, ua-parser-js, colors/faker) to build its current security infrastructure. The skills ecosystem is compressing this timeline — ClawHavoc is its event-stream moment — but the tooling is still nascent.
Practical Implications
Immediate Actions (This Week)
-
Audit all third-party skills in your agent configurations. For each SKILL.md file consumed by your agents, manually review every code block and configuration template. Specifically look for: network calls to unfamiliar domains, configuration file overwrites (pip.conf, .condarc, .npmrc), environment variable access wrapped in exception handlers, and “prerequisite” installation instructions.
-
Restrict network egress from agent execution environments. Block outbound connections to any domain not explicitly allowlisted. The DDIPE paper’s exfiltration payloads all required HTTP/DNS/webhook egress to attacker-controlled infrastructure.
-
Check Axios versions. Verify no environment is running
axios@1.14.1oraxios@0.30.4. If found, treat as full credential compromise.
Short-Term Hardening (Next 30 Days)
-
Deploy cross-model security review. Before executing agent-generated code, pass it through a second model (different vendor/architecture) as a security reviewer. The DDIPE paper shows this compresses bypass rates from 13-20% to 1.6%. Even a lightweight prompt — “Does this code perform any operations not directly related to the stated task?” — catches a significant fraction of disguised payloads.
-
Adopt static scanning for skills. Deploy SkillScan, Snyk Agent Scan, or SkillFortify to scan all skills before loading them into agent context. Static analysis catches 90.7% of adversarial samples and is fast enough for CI/CD integration [15][16].
-
Evaluate SandyClaw or equivalent dynamic analysis for any skills sourced from public marketplaces [14]. Static analysis misses runtime-only behaviors; dynamic sandboxing fills this gap.
-
Pin skills by content hash. Do not consume skills by reference to a mutable source. Clone skills into your repository, hash-verify the contents, and review diffs on any update — the same discipline applied to lock files.
Strategic Architecture (Next Quarter)
-
Implement skill provenance verification. Adopt Ed25519 signing for internally developed skills. Require signed attestations for any externally sourced skills. The STSS pattern provides a reference architecture [12].
-
Separate skill loading from skill execution. Skills metadata should be loaded into a read-only context without automatic execution privileges. Require explicit permission escalation for any skill that needs file I/O, shell access, or network egress — the same principle as mobile app permissions.
-
Build a private skill registry. For production agent infrastructure, maintain a curated, vetted skill registry rather than consuming directly from public marketplaces. This mirrors the private package mirror pattern for npm/PyPI.
-
Contribute to OWASP Agentic Skills Top 10. The framework is actively soliciting practitioner input. Teams with production agent deployments have direct experience with the failure modes being cataloged [11].
Open Questions
-
Alignment training vs. DDIPE: The DDIPE attack specifically exploits in-context learning rather than instruction following. Can alignment training be extended to make models suspicious of code patterns in documentation context, or does this fundamentally conflict with the purpose of providing examples? The paper’s finding that different models have different blind spots suggests this is not a solved problem at the training level.
-
Skill isolation granularity: Current frameworks treat skills as either fully trusted or not loaded at all. What does a practical permission model look like for skills that need some capabilities (e.g., file read) but not others (e.g., network egress)? The mobile permissions model is a reference, but agent tasks are less predictable than mobile app functions.
-
Cross-ecosystem skill provenance: Skills are proliferating across ClawHub, skills.sh, npm, and vendor-specific registries. Is a unified provenance standard feasible, or will each ecosystem develop its own signing and attestation scheme? The fragmentation creates gaps at ecosystem boundaries.
-
Dynamic analysis scalability: SandyClaw’s sandbox detonation approach requires executing every skill. As skill registries grow to tens of thousands of entries with frequent updates, does dynamic analysis remain feasible at registry scale, or does it become a consumer-side check only?
-
Compound attacks: DDIPE targets the generation layer, while dependency poisoning (Axios, LiteLLM) targets the execution layer. No defensive framework currently addresses both surfaces in an integrated way. What does a unified agent supply-chain security architecture look like?
-
The 2.5% irreducible residual: DDIPE found 2.5% of adversarial skills evade both static detection and model alignment. Is this residual compressible with better tooling, or does it represent a fundamental limit of documentation-as-input architectures? The answer determines whether defense-in-depth is sufficient or whether architectural redesign is needed.
Sources
- Qu et al., “Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems” (DDIPE paper), April 2026 — https://arxiv.org/abs/2604.03081
- Sonatype, “Axios Compromise on npm Introduces Hidden Malicious Package,” March 2026 — https://www.sonatype.com/blog/axios-compromise-on-npm-introduces-hidden-malicious-package
- AuthMind, “OpenClaw Malicious Skills: Agentic AI Supply Chain,” February 2026 — https://www.authmind.com/blogs/openclaw-malicious-skills-agentic-ai-supply-chain
- Koi Security, “ClawHavoc: 341 Malicious Clawed Skills Found by the Bot They Were Targeting,” February 2026 — https://www.koi.ai/blog/clawhavoc-341-malicious-clawedbot-skills-found-by-the-bot-they-were-targeting
- Snyk, “Inside the ‘clawdhub’ Malicious Campaign: AI Agent Skills Drop Reverse Shells on OpenClaw Marketplace,” February 2026 — https://snyk.io/articles/clawdhub-malicious-campaign-ai-agent-skills/
- CyberPress, “ClawHavoc Poisons OpenClaw’s ClawHub With 1,184 Malicious Skills,” March 2026 — https://cyberpress.org/clawhavoc-poisons-openclaws-clawhub-with-1184-malicious-skills/
- The Hacker News, “UNC1069 Social Engineering of Axios Maintainer Led to npm Supply Chain Attack,” April 2026 — https://thehackernews.com/2026/04/unc1069-social-engineering-of-axios.html
- The Hacker News, “Axios Supply Chain Attack Pushes Cross-Platform RAT via Compromised npm Account,” March 2026 — https://thehackernews.com/2026/03/axios-supply-chain-attack-pushes-cross.html
- SOCRadar, “Axios npm Hijack 2026: Everything You Need to Know,” April 2026 — https://socradar.io/blog/axios-npm-supply-chain-attack-2026-ciso-guide/
- Repello AI, “ClawHavoc: Inside the Supply Chain Attack That Targeted 300,000 AI Agent Users,” February 2026 — https://repello.ai/blog/clawhavoc-supply-chain-attack
- OWASP, “Agentic Skills Top 10,” 2026 — https://owasp.org/www-project-agentic-skills-top-10/
- Ken Huang, “Agent Skill Trust & Signing Service,” 2026 — https://kenhuangus.substack.com/p/agent-skill-trust-and-signing-service
- Invariant Labs, “MCP Security Notification: Tool Poisoning Attacks,” 2025 — https://invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks
- Permiso Security, “Introducing SandyClaw: The First Dynamic Sandbox for AI Agent Skills and Prompts,” April 2, 2026 — https://permiso.io/blog/introducing-sandyclaw-dynamic-sandbox-ai-agent-skills
- GitHub/kurtpayne, “skillscan-security: Security scanner for AI agent skills and MCP tool bundles,” 2026 — https://github.com/kurtpayne/skillscan-security
- GitHub/qualixar, “SkillFortify: Formal security scanner for AI agent skills,” 2026 — https://github.com/qualixar/skillfortify
- Practical DevSecOps, “MCP Security Vulnerabilities: How to Prevent Prompt Injection and Tool Poisoning Attacks in 2026” — https://www.practical-devsecops.com/mcp-security-vulnerabilities/
- Elastic Security Labs, “MCP Tools: Attack Vectors and Defense Recommendations for Autonomous Agents” — https://www.elastic.co/security-labs/mcp-tools-attack-defense-recommendations
- Acuvity, “Tool Poisoning: Hidden Instructions in MCP Tool Descriptions” — https://acuvity.ai/tool-poisoning-hidden-instructions-in-mcp-tool-descriptions/
- MCP Playground, “MCP Security in 2026: Tool Poisoning, OWASP MCP Top 10, and How to Protect Your Agents” — https://mcpplaygroundonline.com/blog/mcp-security-tool-poisoning-owasp-top-10-mcp-scan
- Penligent AI, “AI Agents Hacking in 2026: Defending the New Execution Boundary” — https://www.penligent.ai/hackinglabs/ai-agents-hacking-in-2026-defending-the-new-execution-boundary/
- Microsoft, “Mitigating the Axios npm supply chain compromise,” April 1, 2026 — https://www.microsoft.com/en-us/security/blog/2026/04/01/mitigating-the-axios-npm-supply-chain-compromise/
- NVIDIA, “Practical Security Guidance for Sandboxing Agentic Workflows and Managing Execution Risk” — https://developer.nvidia.com/blog/practical-security-guidance-for-sandboxing-agentic-workflows-and-managing-execution-risk/
- Snyk, “ToxicSkills: Malicious AI Agent Skills on ClawHub,” 2026 — https://snyk.io/blog/toxicskills-malicious-ai-agent-skills-clawhub/