Summary
The agent skills ecosystem has undergone a rapid, largely uncoordinated convergence toward a single architectural pattern: a directory containing a SKILL.md file with YAML frontmatter metadata and a markdown body of instructions, optionally accompanied by scripts, references, and assets. Anthropic published this as an open standard in December 2025; within four months, OpenAI Codex, Google Gemini CLI, GitHub Copilot, Cursor, JetBrains Junie, CrewAI, LangSmith Fleet, and over 30 other agent products adopted structurally identical or compatible formats. The AGENTS.md specification — a complementary standard stewarded by the Agentic AI Foundation under the Linux Foundation — addresses project-level agent configuration rather than reusable capability bundles, and coexists rather than competes. The performance data is compelling: Google measured a 70-point accuracy improvement (28% → 97%) from a single skills file; LangChain reported a similar jump (29% → 95%) for their open-source ecosystem skills. This is not incremental — skills are now the primary lever for agent capability, outperforming fine-tuning for domain adaptation.
Key Findings
1. The Format Has Converged: SKILL.md Is the De Facto Standard
The core format across all major implementations is remarkably consistent:
---
name: skill-name
description: What this skill does and when to use it
---
# Skill Name
## Instructions
[Step-by-step guidance for the agent]
## Examples
[Concrete usage examples]
Required fields are name and description in YAML frontmatter. The markdown body contains procedural instructions. Optional subdirectories hold scripts (scripts/), reference materials (references/), and assets (assets/).
Anthropic’s specification defines the canonical field constraints: name is max 64 characters, lowercase alphanumeric with hyphens; description is max 1024 characters [1][2]. OpenAI adopted structurally identical naming conventions and metadata format in both Codex and the Responses API [3][4]. GitHub Copilot reads skills from .github/skills/ directories using the same SKILL.md frontmatter format [5]. CrewAI, Cursor, JetBrains Junie, and Vercel’s AI SDK all support the standard [6][7][8].
The convergence was not coordinated by committee. Anthropic published the open standard first; competitors adopted it because the format solved a real problem and was simple enough to not need proprietary extension. OpenAI’s adoption was particularly telling — their Codex CLI uses identical file naming conventions, metadata format, and directory organization [9].
2. Progressive Disclosure Is the Key Architectural Innovation
What distinguishes the SKILL.md pattern from earlier approaches (system prompts, AGENTS.md, custom instructions) is progressive disclosure — a three-level loading architecture that decouples skill discovery from skill execution:
| Level | When Loaded | Token Cost | Content |
|---|---|---|---|
| Level 1: Metadata | Always (at startup) | ~100 tokens per skill | name and description from YAML frontmatter |
| Level 2: Instructions | When skill is triggered | Under 5,000 tokens | SKILL.md body with instructions and guidance |
| Level 3+: Resources | As needed during execution | Effectively unlimited | Bundled files, scripts, references |
This means an agent can have dozens of installed skills with negligible context cost. The agent loads skill names and descriptions at startup, reads the full SKILL.md only when it determines relevance, and accesses bundled resources only when specific instructions reference them [1][2].
The practical implication is that the amount of context bundled into a skill is effectively unbounded. Scripts execute via bash and return only their output — the code itself never enters the context window. Reference documents are read on demand. This is why skills outperform monolithic system prompts: they match the right context to the right task at the right time.
Google’s Gemini implementation adds a functional twist: skills provide an activate_skill tool and a fetch_url tool for retrieving live documentation, enabling skills to pull current information rather than relying solely on bundled content [10].
3. AGENTS.md and SKILL.md Are Complementary, Not Competing
A common source of confusion is the relationship between AGENTS.md and SKILL.md. They serve different purposes:
| Dimension | AGENTS.md | SKILL.md |
|---|---|---|
| Scope | Project-level configuration | Reusable capability bundle |
| Purpose | ”How to work in this codebase" | "How to perform this task” |
| Discovery | Nearest file in directory tree | Skill directories in known paths |
| Portability | Project-specific | Cross-project, cross-platform |
| Governance | Linux Foundation (Agentic AI Foundation) | Anthropic (open standard) |
| Adoption | 60,000+ GitHub repositories [11] | 30+ agent products [9] |
AGENTS.md tells an agent about a specific project’s conventions, coding standards, and constraints — analogous to a README for agents. SKILL.md teaches an agent how to perform a generalizable task — analogous to a training manual. A well-configured agent stack uses both: AGENTS.md for project context, skills for domain capabilities.
The AGENTS.md specification uses standard markdown with no required frontmatter. The filename must be uppercase. Files are discovered by walking the directory tree upward, with the nearest file taking precedence — enabling per-directory overrides [11][12].
4. The Performance Data Is Unambiguous
Three independent measurements confirm that skills dramatically improve agent capability without model changes:
| Vendor | Baseline | With Skill | Improvement | Model |
|---|---|---|---|---|
| Google DeepMind [10] | 28.2% | 96.6% | +68.4 pts | Gemini 3.1 Pro |
| LangChain [13] | 29% | 95% | +66 pts | Claude (via Claude Code) |
| Google DeepMind [10] | 6.8% | 87% | +80.2 pts | Gemini 3.0 Flash |
Google’s evaluation used 117 prompts across agentic coding, chatbots, document processing, and streaming tasks. LangChain tested coding agent performance on LangChain/LangGraph tasks specifically. Both demonstrate that the gap between “general-purpose model” and “domain-specialist agent” is primarily a context gap, not a capability gap.
Google additionally reported 63% fewer tokens per correct answer when combining skills with MCP documentation access — skills not only improve accuracy, they improve efficiency by giving the agent a direct path rather than exploratory search [10].
5. The Ecosystem Is Stratifying Into Three Tiers
The skills ecosystem is developing a clear hierarchy:
Tier 1: Platform-native skills. Anthropic ships pre-built skills for document processing (PowerPoint, Excel, Word, PDF) [2]. OpenAI bundles skills into the Responses API’s hosted container workspace [3]. These are vendor-maintained and optimized for their specific runtime environments.
Tier 2: Verified partner skills. Anthropic’s skills repository (62,000+ GitHub stars within four months) includes partner-built skills from Atlassian, Figma, Canva, Stripe, and Notion [9]. These represent official vendor integrations maintained by the tool providers themselves.
Tier 3: Community skills. Thousands of community-contributed skills are compatible with the universal format. VoltAgent’s awesome-agent-skills repository catalogs skills across multiple platforms [14]. The Serenities AI guide documents compatibility across 16+ tools [6]. Quality and maintenance vary widely.
LangSmith Fleet occupies a unique position — it provides a managed skills platform where skills are created, versioned, and shared within a workspace, with automatic sync and upcoming version pinning [15]. Skills can be downloaded and used in Claude Code, Cursor, or Codex via CLI, providing a bridge between managed and filesystem-based approaches.
6. Security Is the Unresolved Risk
Anthropic’s documentation includes explicit warnings: “Skills provide Claude with new capabilities through instructions and code, and while this makes them powerful, it also means a malicious Skill can direct Claude to invoke tools or execute code in ways that don’t match the Skill’s stated purpose” [2].
The threat model is real. Skills can:
- Execute arbitrary code via bundled scripts
- Make network requests (in Claude Code, with full local network access)
- Reference external URLs whose content can change post-audit
- Invoke tools in harmful ways while appearing benign in their SKILL.md
There is currently no code signing, no integrity verification, and no sandboxing at the skill level. The mitigation is entirely trust-based: only use skills from trusted sources. This is reminiscent of early package manager ecosystems before supply chain security became a priority — and given the LiteLLM attack from this same week (Edition 5), the parallel is uncomfortable.
Runtime environment constraints vary by surface: Claude API skills have no network access and cannot install packages; Claude Code skills have full network access and local package installation [2]. This inconsistency means a skill that’s safe in one environment may be dangerous in another.
Practical Implications
1. Invest in skills now — the format is stable. The SKILL.md format with YAML frontmatter is the clear winner. Build skills in this format and they’ll work across Claude Code, OpenAI Codex, GitHub Copilot, Gemini CLI, Cursor, and more. The portability story is real, not aspirational.
2. Use AGENTS.md and SKILL.md together. AGENTS.md defines your project’s rules of engagement. Skills define reusable capabilities. Don’t choose between them — layer them. Put project conventions in AGENTS.md; put generalizable workflows in skills.
3. Measure your context gap. Google’s 28% → 97% result is a wake-up call. If your agents are underperforming, the most likely fix isn’t a better model — it’s better context. Write a skill for your most common agent tasks and measure the before/after accuracy. The ROI will likely exceed any model upgrade.
4. Treat skill security like dependency security. Audit third-party skills before installation. Review bundled scripts, check for network calls, and be wary of skills that fetch external content. The ecosystem has no integrity verification infrastructure yet — treat every community skill as untrusted code.
5. Design skills for progressive disclosure. Put the minimum viable instruction in SKILL.md. Put detailed references in separate files. Put deterministic operations in scripts. This isn’t just good architecture — it directly reduces token costs and improves accuracy by keeping context focused.
6. Watch the versioning problem. LangSmith Fleet is adding version pinning; the filesystem-based approach has no versioning at all. If you’re sharing skills across a team, you need a distribution strategy — git submodules, a skills registry, or a managed platform like Fleet.
Open Questions
-
Will skill signing emerge? The current trust model won’t scale to a public ecosystem. Will Anthropic or another vendor introduce cryptographic verification for skills, similar to package signing?
-
How will skill conflicts be resolved? When multiple skills match a request, which takes precedence? The specification is silent on conflict resolution and priority ordering across skills from different sources.
-
Will AGENTS.md and SKILL.md formally merge? Both use markdown. Both configure agent behavior. The Agentic AI Foundation and Anthropic’s open standard exist in parallel — will governance consolidate, or will the two standards continue to coexist at different layers?
-
What about non-coding agent skills? The current ecosystem is overwhelmingly focused on coding agents. As agents expand into customer support, data analysis, and enterprise workflows, will the SKILL.md format prove sufficient, or will domain-specific extensions emerge?
-
How will skill quality be measured? Beyond star counts, there’s no standard way to evaluate whether a skill actually improves agent performance. Will evaluation benchmarks for skills emerge as a category?
Sources
- Anthropic Engineering: Equipping Agents for the Real World with Agent Skills
- Anthropic: Agent Skills Overview — Claude API Docs
- OpenAI: Using Skills to Accelerate OSS Maintenance — Developers Blog
- OpenAI: Agent Skills — Codex Docs
- Microsoft: Use Agent Skills in VS Code — Copilot Docs
- Serenities AI: Agent Skills Guide 2026 — Build Skills for 16+ AI Tools
- CrewAI: Skills Concept Documentation
- Vercel AI SDK: Guides — Add Skills to Your Agent
- VentureBeat: Anthropic Launches Enterprise Agent Skills and Opens the Standard
- Google Developers Blog: Closing the Knowledge Gap with Agent Skills
- AGENTS.md Official Site
- GitHub: agentsmd/agents.md Repository
- LangChain Blog: Skills in LangSmith Fleet
- GitHub: VoltAgent/awesome-agent-skills
- LangChain Blog: Introducing LangSmith Fleet
- OpenAI: Skills in the API — Cookbook
- InfoQ: OpenAI Extends Responses API for Agents
- GitHub: anthropics/skills Repository
- GitHub: google-gemini/gemini-skills Repository
- Medium: The SKILL.md Pattern — How to Write AI Agent Skills That Actually Work
- Medium: Beyond Prompt Engineering — Using Agent Skills in Gemini CLI
- arXiv: Agent Skills for Large Language Models — Architecture, Acquisition, Security