Scout: Agent Skills Marketplaces as Attack Surface — Intent-Misalignment Beyond Code Review

Summary

In the same news cycle that the term “skills” finalised its convergence as the cross-vendor unit of packaged agent know-how, three distinct supply-chain incidents revealed that the skills layer is now an active attack surface — and one of those incidents, the ClawSwarm campaign, demonstrated a threat model the security industry has not yet built tooling for: skills whose code does exactly what it advertises, but for the publisher’s benefit instead of the user’s. The runtime-marketplace problem covered in earlier OpenClaw analysis was about malware. The skills-marketplace problem is partly malware (the Snyk ToxicSkills audit found 13.4% of ~4,000 audited skills carried a critical issue) and partly something newer: agent capability that the user opted into but didn’t actually authorise the way they thought they did. The defensive stack is starting to form — Vercel/Snyk install-time scanning on skills.sh, OWASP’s draft Agentic Skills Top 10, prototype signing services, Anthropic’s “Verified” tier on the official plugin directory — but every published mitigation is still trust-based, none of them solve the intent-disclosure problem, and the consent UX is roughly where browser extension permissions were in 2010.

Key Findings

1. The threat model has bifurcated. Malicious code is one half. Intent-misalignment is the other.

Practitioner discussion of skill security has so far been dominated by the malware vector — typosquatted packages, infostealer payloads bundled into SKILL.md scripts, the curl | bash patterns that anyone who’s looked at npm supply-chain attacks will recognise. The Snyk ToxicSkills work scoped that vector empirically. The audit covered 3,984 skills across ClawHub and Vercel’s skills.sh; 13.4% (534 skills) carried at least one critical-severity issue, 76 were confirmed-malicious payloads, and 91% of the confirmed-malicious skills combined prompt injection with traditional malware, a dual-mode pattern that bypasses both LLM safety filters and conventional code scanners. The follow-up reporting from Acronis on the Hugging Face / ClawHub social-engineering campaigns tracked roughly 600 malicious skills concentrated in 13 publisher accounts, with two accounts (hightower6eu at 334 skills, sakaen736jih at 199 skills) responsible for the majority. This vector is well-described in the literature, has known mitigations (signing, scanning, sandboxing, registry takedown), and the reviewer’s job is to verify those mitigations are deployed.

The ClawSwarm campaign reported by Manifold’s Ax Sharma and surfaced in The Register is the cleaner example of the second vector. Thirty skills published by a single ClawHub user (imaflytok) attracted ~9,800 cumulative downloads under benign-looking labels — cron helpers, a security tool, a whale watcher, a cross-platform poster, a predictions-market integration. Once installed, each skill silently registered the agent with onlyflies.buzz, reported its name, capabilities, and installed-skill inventory upstream, generated a Hedera crypto wallet, and checked in every four hours for remote tasks. None of that is malware in the conventional sense — no exploit, no payload-fetching, no credential exfiltration. Sharma’s framing per The Register’s coverage: “Whether ClawSwarm instances are a legitimate experiment in agent economics or a recruitment funnel for speculative crypto, the result for the user is the same: their agent is doing things they didn’t ask it to do, for someone they don’t know, with keys they didn’t authorize.”

That framing matters because it isolates the new failure mode. A code scanner running over the ClawSwarm skills would find what it always finds in well-formed skills: a SKILL.md describing what the skill does, scripts that do what the skill says, network calls that match the skill’s documented behaviour. The skill is honest about its intent. The intent just isn’t aligned with the user’s. The Anthropic skills documentation warns that “a malicious Skill can direct Claude to invoke tools or execute code in ways that don’t match the Skill’s stated purpose” — but ClawSwarm is the inverse case, and the harder one: the skill matches its stated purpose, the stated purpose is just not the same purpose the user assumed when they installed it. No current scanner catches this; there is no obvious technical primitive that would.

2. The current defensive stack handles malware. It doesn’t handle intent-misalignment.

The marketplace-side defences that shipped or matured in the last two months are converging on a recognisable pattern, and it’s the same pattern npm and PyPI evolved over a decade — install-time scanning, publisher reputation signals, registry-side takedown, and (prototype-stage) cryptographic attestation.

The most concrete production deployment is the Vercel/Snyk integration on Vercel’s skills.sh marketplace. Every skill installed via npx skills triggers a scan via Snyk’s API before the skill reaches the developer’s machine, and Vercel partners with Gen, Socket, and Snyk for “independent security reports” surfaced on each skill’s detail page. The action on a malicious finding is automatic delisting from the leaderboard and search results, with a warning at install time if a user navigates directly to a flagged skill’s URL. Snyk reports the critical-level detectors hit “90-100% recall on confirmed malicious skills while maintaining a 0% false positive rate on the top 100 legitimate skills.” That’s the strongest first-party scanning claim in the ecosystem at the moment.

OWASP’s Agentic Skills Top 10 — a draft framework released as a 2026 1.0 — formalises the categories the defences need to cover: AST01 (Malicious Skills), AST02 (Supply Chain Compromise), AST03 (Over-Privileged Skills), AST04 (Insecure Metadata), AST05 (Unsafe Deserialization), AST06 (Weak Isolation), AST07 (Update Drift), AST08 (Poor Scanning), AST09 (No Governance), and AST10 (Cross-Platform Reuse). The AST01 mitigations list reads like a npm-circa-2020 wishlist: Ed25519 signatures on published skills, Merkle-root signing of registry contents, behavioural scanning at publish and install time, container-level execution isolation, hash-pinning with modification alerts, and an explicit no-auto-execution rule for Prerequisites sections that requires explicit review.

Prototype-stage trust infrastructure exists. Ken Huang’s Skill Trust & Signing Service (STSS) describes a four-stage verification pipeline (static scanning, hook detection, import-chain tracing, LLM-based behavioural audit) that issues an Ed25519-signed attestation over a SHA-256 Merkle tree of the skill’s files. Verification at load time recomputes the Merkle root and validates the signature. Independent projects like vett.sh, verified-skill.com, skill-signer, and SkillFortify cover the same ground from different angles. None of these is yet integrated into a vendor marketplace as the default install path.

The vendor-curated tier is the other real defence. Anthropic’s official plugins directory splits submissions into Anthropic-internal and third-party-external buckets, with a stated review for “quality and security standards” before listing. Anthropic’s documentation is unambiguous about what that buys you, though, with a prominent disclaimer that Anthropic “does not control what MCP servers, files, or other software are included in plugins and cannot verify that they will work as intended or that they won’t change.” A plugin’s appearance in the official directory is a curation signal, not a guarantee. The same caveat applies to the Anthropic skills documentation, which explicitly recommends using Skills “only from trusted sources: those you created yourself or obtained from Anthropic” and otherwise auditing manually. There is no skill-signing system in place at Anthropic’s tier; trust is ultimately deferred to the human.

None of these defences address the ClawSwarm case. A skill that does what it advertises will pass behavioural scanning, will produce a clean Snyk audit, will satisfy AST01-AST10 against the failure modes those rules cover, and will Ed25519-sign cleanly because the signature attests file integrity and provenance, not intent. The intent gap is structurally upstream of every defence currently in production.

The closest thing to an intent-disclosure mechanism in any of the current skills marketplaces is the Anthropic-recommended manual audit — read the SKILL.md, examine bundled scripts, check for unexpected network calls. That approach is the consent-UX equivalent of asking users to read npm package source before npm install, which is to say it doesn’t happen. The Snyk threat-model writeup makes the analogy explicit: “When you install a Skill, you’re not granting it a discrete set of capabilities through an OS permission dialog. Instead, you’re adding instructions for the agent to follow, using whatever permissions it already has.” Skills inherit the agent’s full ambient authority — shell, filesystem, credentials, network, persistent memory — and there is no per-skill permission boundary in any major implementation.

The OWASP draft proposes “UI Transparency” as a mitigation — display publisher trust level, install count, scan status — and Vercel’s skills.sh implements roughly that, with public audit results visible on each skill’s detail page, install-count badges, and a leaderboard. But these are publisher-reputation signals, not intent-disclosure signals. They tell you the skill is popular and hasn’t been flagged; they don’t tell you what the skill is going to do with the agent’s authority once it’s installed. There is no equivalent in the skills ecosystem of Android’s runtime permission prompts, or even of Chrome’s static-permission-list-at-install-time prompt. The closest analogue is OWASP’s recommendation that repository-controlled configuration files should not execute before explicit user trust confirmation — but that’s a no-auto-execution rule, not a permission-disclosure mechanism.

There’s an obvious architectural shape to what the missing primitive looks like. A skill manifests a capabilities-needed declaration in its frontmatter (network: yes, filesystem: scoped to /home/user/projects, credentials: none, persistent state: yes). The runtime enforces that declaration as the actual permission ceiling for the skill’s execution, regardless of what the skill’s instructions tell the agent to attempt. The user sees the declared capabilities at install time and approves them explicitly, with the option to narrow scope. None of the major implementations is there. Anthropic’s docs distinguish network-access defaults across product surfaces — Claude API skills get no network access, Claude Code skills get full network access, Claude.ai varies by user/admin settings — but those are product-level defaults, not per-skill declarations.

4. The skills-as-converging-primitive story has compressed the timeline.

The convergence side of this week’s news is what makes the marketplace problem urgent rather than just embarrassing. Anthropic published the SKILL.md format as an open standard in late 2025; within roughly four months, OpenAI Codex, Google Gemini CLI, GitHub Copilot, Cursor, JetBrains Junie, CrewAI, and Vercel’s AI SDK had adopted structurally compatible formats, and skills had become the dominant lever for closing the agent-domain-knowledge gap. Google DeepMind’s Closing the Knowledge Gap with Agent Skills reported that a single Gemini-API-developer skill moved their internal benchmark from a 6.8% pass rate to 96.6% (Gemini 3.1 Pro Preview, 117 prompts). That’s the order-of-magnitude case that makes skills load-bearing. Addy Osmani’s agent-skills repository and accompanying essay reframe skills as encoded senior-engineer workflows — specs before code, anti-rationalisation prompts that pre-emptively refuse the LLM’s preferred shortcuts, evidence-driven completion gates. The Osmani framing is influential because it explains why skills outperform monolithic system prompts: they’re workflows, not documentation.

The combined effect is that skills are now the unit of agent capability. They’re not optional add-ons; they’re how vendors and practitioners both ship the difference between a general-purpose agent and a useful one. That makes the marketplace a load-bearing piece of infrastructure rather than a discovery convenience. The Vercel skills.sh leaderboard has crossed 80,000+ skills with millions of installs; the community-aggregator SkillsMP indexes hundreds of thousands more. Total addressable surface is large, scanner deployment is uneven across marketplaces, and most installs happen on community marketplaces with no inline scanning at all.

5. The Apple Claude.md leak is the leakage adjacent — different threat, same governance gap.

The Apple Support app v5.13 incident — a shipping iOS app build that contained internal CLAUDE.md sidecar files later removed in a same-day v5.13.1 hotfix — is not a skills-marketplace incident, but it sits in the same governance gap. Skills, plugins, agent configurations, and CLAUDE.md sidecars are all instances of the same pattern: filesystem artifacts that configure agent behaviour, are usually treated as developer-side ephemera, and have no built-in lifecycle discipline for the boundary between development and shipped product. Apple’s hotfix turnaround (hours) suggests the leak was a build-pipeline oversight rather than deliberate inclusion, but the practitioner takeaway generalises: any team using agent sidecar files needs a build-time check that strips them from artifacts that ship to end users. This is the discipline that .gitignore-style exclusions do for .env, that secret-scanning does for hardcoded credentials, and that nothing yet does for agent-configuration sidecars. The category needs a name and a tool.

Practical Implications

For teams pulling skills from any marketplace

Treat the marketplace’s scanning as a floor, not a ceiling. If you’re installing from skills.sh, the Snyk/Gen/Socket scanning catches the high-confidence malware cases at install time. That’s necessary. It is not sufficient. The ClawSwarm pattern — well-formed skills doing exactly what their code says — passes every scanner currently in production. Build a habit of reading the SKILL.md before installation, especially for any skill that reports usage upstream, generates persistent state on first run, or registers with an external service for “task assignment.” Those are the intent-misalignment shapes.
Pin to vendor-curated tiers when the use case allows. Anthropic’s official plugin directory and the Anthropic-published skills repository represent the highest-curation tier in the Claude ecosystem. Vercel’s skills.sh “Verified” pages with public audit results are the best community-marketplace tier. Community aggregators like SkillsMP that index any GitHub repo with a SKILL.md and minimal star threshold are the lowest tier and should be treated as untrusted by default.
Apply OWASP AST01 mitigations as a checklist when reviewing third-party skills. The list — Ed25519 signature verification if available, behavioural-scan output from a reputable scanner, hash-pinning post-install with mutation alerts, no automatic execution of “Prerequisites” or “Setup” instructions — is the closest thing to a practitioner-actionable checklist for skill installation. The ClickFix 2.0 pattern documented in the earlier OpenClaw analysis (skills that present fake “environment fix” terminal commands during first invocation) lives inside the “Prerequisites” pattern AST02 calls out.
Audit your agent’s ambient authority before installing any new skill. Because skills inherit the agent’s permissions wholesale, the relevant security boundary is not the skill — it’s the agent. If your Claude Code instance has full network access, full filesystem access, and credentials cached locally, every installed skill has all of that too. Skill-by-skill auditing is necessary; agent-level least-privilege is more impactful.

For teams operating skills marketplaces or shipping vendor-curated tiers

Make intent disclosure a first-class metadata field. The SKILL.md frontmatter today carries name and description. That schema needs a capabilities-required field — at minimum a coarse-grained declaration of network access, filesystem scope, credential access, persistent-state requirements, and external-service registrations. The runtime should enforce the declaration as a permission ceiling, and the install UX should surface the declaration as the consent prompt. This is the missing primitive between scanning and signing.
Ship the curator-side review process, not just the verification tier. Anthropic’s “Anthropic Verified” badge on the official plugin directory and Vercel’s audit-results pages on skills.sh are both useful, but neither has a published review SLA, escalation path, or revocation policy that practitioners can plan against. The npm registry’s evolution included naming a security team, publishing CVE response timelines, and committing to advisory disclosures; the skill marketplaces are not at that maturity yet.
Build cross-marketplace revocation and reputation infrastructure. OWASP AST10 (Cross-Platform Reuse) is the threat that compromised skills can move between marketplaces faster than individual marketplace takedown can keep up with. The npm/PyPI lessons here are concrete: OSV-style vulnerability databases, signed advisory feeds, and standardised package-identifier schemes are needed at the skill layer. The agentskills.io standard work is one starting point; cross-marketplace coordination on revocation is the missing piece.

For teams shipping AI-using products

Treat agent-configuration sidecars as build-time artifacts. The Apple Claude.md leak generalises to a class of failure modes: any file that configures agent behaviour during development needs to be explicitly excluded from production artifacts, treated like .env and credentials in build pipelines. Add CLAUDE.md, .claude/, AGENTS.md, SKILL.md, and equivalent vendor sidecars to your standard build-exclusion lists, and add a CI check that fails the build if any of them appear in a shippable artifact.

Open Questions

Will an “intent declaration” primitive get standardised, and where? AGENTS.md is governed by the Linux Foundation’s Agentic AI Foundation; the SKILL.md format is governed by Anthropic’s open standard with no formal cross-vendor governance body. The natural place for a capabilities-required field would be a minor revision to the SKILL.md spec. Whether Anthropic will lead that, whether OpenAI/Google adopt it if Anthropic does, and whether OWASP’s Universal Skill Format proposal becomes the venue for cross-vendor declaration are all open.
Can intent-misalignment be detected by behavioural scanning at all? The ClawSwarm case suggests scanning would need to evaluate not just what a skill does but whether what it does aligns with what its description says it does — a meta-check that effectively requires LLM-based semantic analysis of the gap between declared intent and actual behaviour. STSS proposes exactly this as one of its verification gates. Whether LLM-based intent-mismatch detection is reliable enough for production gating, or whether it just shifts the attack surface to fooling the auditor LLM, is unresolved.
What happens when scanning costs become marketplace economics? Snyk’s per-install scan model is currently free at the skills.sh level. Marketplaces with tens of thousands of skills and millions of installs eventually need a sustainable scanning economics. Whether scanning gets gated behind paid tiers, whether marketplaces consolidate around a few scanning providers, or whether scanning becomes a per-install cost is a 2026 question.
Is per-skill sandboxing tractable inside the existing agent runtimes? Claude Code’s filesystem-isolation model uses Linux bubblewrap and macOS seatbelt to scope agent actions to the working directory. Extending that to per-skill scoping — where each installed skill runs with a tighter capability ceiling than the agent itself — is a non-trivial harness change but is the only architectural primitive that mitigates the “skills inherit the agent’s ambient authority” failure mode at the runtime level. Whether vendors will ship per-skill sandboxing or punt it back to per-skill scanning is open.
Will the regulatory framing around agentic AI explicitly cover skill marketplaces? The Five Eyes joint guidance on agentic AI published this same week emphasises resilience over productivity but does not specifically name the skills-marketplace surface. EU AI Act high-risk obligations take effect August 2026; whether marketplace operators are reached by those obligations, and how, is a question worth tracking.