Agent Skills, MCP, and Scaffolds: A 2026 Guide to the New Vocabulary of AI Agents

Between May 25 and May 27, 2026, three serious organizations — Microsoft Research, AWS, and Hugging Face — shipped material that defines what an agent skill is, and they did not all use the word the same way. If you build, deploy, or evaluate AI agents, this is the week the vocabulary started consolidating, and the week it became dangerous to assume "skill" means the same thing in every doc you read. This guide explains what each definition actually says, where MCP fits, what a "scaffold" is doing in the picture, and which abstraction to reach for when.

Clawvard's audience is mostly platform engineers and tech leads picking between agent frameworks. The choice between investing in skills, an MCP server, a scaffold, or "just a better prompt" is now the single biggest determinant of how portable your agent stack will be in 12 months. The good news: the three definitions that landed this week are more compatible than they look.

Why "agent skills" suddenly matter — the May 2026 timeline

In one five-day window:

2026-05-25 — Hugging Face published Harness, Scaffold, and the AI Agent Terms Worth Getting Right, an explicit attempt to nail down the agent vocabulary (model, scaffold, harness, agent, skills, sub-agents).
2026-05-25 / 05-26 — Microsoft Research posted the SkillOpt project page and the paper SkillOpt: Executive Strategy for Self-Evolving Agent Skills, treating a natural-language skill document as the trainable state of a frozen agent.
2026-05-26 — AWS published Well-Architected Skills and Steering for AI Coding Agents, a set of reusable playbooks that work across 12 coding agents (Claude Code, Codex, Cursor, Kiro, Windsurf, Copilot, Gemini CLI, Antigravity, Junie, Amp, Cline, AWS DevOps Agent).
2026-05-26 — The Agent Voyager Project published its first captain's log comparing skills vs MCP vs prompts on the same task with the same model.
2026-05-27 — Simon Green published the contrarian piece No, your Agent Skill is not automation.

Three independent labs reaching for the same word in the same week is the signal: this is the moment the term gets standardized — or fragmented — for the rest of 2026.

What is an agent skill?

The clearest working definition, drawn from the three sources above, is:

An agent skill is a packaged, named, reusable unit of procedural knowledge — usually a folder of instructions, examples, scripts, and resources — that an agent loads when it is relevant to the current task.

Hugging Face's glossary positions skills inside the scaffold: they are "modular capabilities" that the model pulls in on demand rather than always carrying in context (Hugging Face, 2026-05-25). Simon Green's piece, which is hostile to over-claims about skills, still describes them in the same shape: "folders of instructions, scripts, and resources Claude can load when needed" (sjg.io, 2026-05-27).

The Microsoft SkillOpt paper goes further, making the skill document a first-class artifact:

"[T]he skill should ... be trained as the external state of a frozen agent, with the same discipline that makes weight-space optimization reproducible." (Yang et al., arXiv:2605.23904)

In SkillOpt's framing, the model stays frozen and the skill is the thing that learns. That is a stronger claim than "skill = reusable prompt template" — it treats the skill as the trainable surface of the system.

AWS's Well-Architected Skills repo is the most product-shaped of the three. It ships skills that "teach AI coding agents how to apply the AWS Well-Architected Framework" across 12 different agents from a single set of playbooks (aws-samples, 2026-05-26). That portability across harnesses is the operational promise that makes skills interesting to platform teams.

The three definitions agree on the shape (named, reusable, lazily loaded, mostly natural language plus optional scripts and templates) and disagree mostly on what's around them — which is what the rest of this guide is about.

How do skills differ from MCP servers?

This is the question Clawvard readers send most often, so it gets the longest answer.

Skills are scoped capabilities; MCP is a protocol

MCP (Model Context Protocol) is a wire protocol: it standardizes how an agent's harness talks to an external tool server. An MCP server typically exposes a list of tools (functions with JSON Schemas), a list of resources, and optionally prompts. It is closer in spirit to "REST for agent tools" than to "skills."

A skill is a chunk of procedural know-how the agent can pull into its working context. A skill can use MCP tools, expose CLI commands, ship its own scripts, or just be a markdown file describing how to handle a class of task.

In other words, MCP is how an agent reaches a capability; a skill is how an agent decides to use one. They are complementary, not alternative.

When you'd reach for each

Use an MCP server when:

You need to expose a capability to many different agents or runtimes without rewriting wrappers (Claude Code, Codex, Cursor, your own harness).
The capability is genuinely a service — it runs in its own process, holds credentials, manages connections to an external system.
You want versioning and auth that live outside the prompt.

Use a skill when:

You want to encode a workflow (how to apply Well-Architected when refactoring; how to triage a bug; how to author a campaign brief).
The knowledge is mostly natural language plus small scripts, not a stateful service.
You want it to load on demand so it doesn't bloat every prompt.

Use both when you have something serious: AWS's repo is exactly this pattern — the skills describe how to think about Well-Architected, and they invoke external tools (some of which are MCP-served) to do the work (aws-samples, 2026-05-26).

For a side-by-side test of all three setups on the same task, the Agent Voyager Project captain's log #1 is worth a read: the winning configuration on their PDF-to-HTML benchmark was not "skill" or "MCP" alone — it was a stepwise prompt with a self-check line, which beat both at 95% accuracy on claude-haiku-4-5. The lesson is uncomfortable but real: the abstraction is not magic; the content still does the work.

Where does a "scaffold" fit?

Hugging Face's glossary cleanly separates scaffold from harness:

Scaffold = "the behavior-defining layer around the model: system prompt, tool descriptions, how the model's responses get parsed, what it remembers across steps (context management). It shapes how the model sees the world and acts in it." (HF, 2026-05-25)
Harness = "the execution layer inside the agent: it calls the model, handles its tool calls, decides when to stop."

In that mapping, skills live inside the scaffold — they are pieces of the scaffold the agent activates when relevant. The harness is the runtime that loads them, calls the model with them, and parses what comes back.

Products like Claude Code and Codex usually collapse the distinction and call the whole thing a "harness." That is fine in casual speech; it stops being fine when you're trying to reason about which layer to invest engineering effort in.

Clawvard's earlier coverage of the same fault line is in Harness, Scaffold, Loop, Skill: The AI Agent Vocabulary That Actually Matters and Agent Harness vs Scaffold vs Skill: A Practical 2026 Glossary. This guide is the 2026 follow-up: same fault line, now with SkillOpt and AWS in the picture.

AWS Well-Architected for AI Agents — the operations view

AWS's repo is the most underrated artifact of the week. It ships:

A set of skills that explain how to apply the AWS Well-Architected pillars (operational excellence, security, reliability, performance efficiency, cost optimization, sustainability).
A set of steering documents (rules the agent should keep in mind across tasks, regardless of the active skill).
12 adapters for popular coding agents — same playbook, twelve harnesses.

What makes this an "operations view" rather than another research artifact is the explicit portability requirement: the same skill bundle works in Claude Code, Codex, Cursor, Kiro, Windsurf, GitHub Copilot, Gemini CLI, Antigravity, Junie, Amp, Cline, and AWS DevOps Agent (aws-samples, 2026-05-26).

This is the first credible signal that skills are emerging as the platform-portable unit of agent behavior. If you are picking an investment target this quarter, that portability is a strong reason to write your team's playbooks as skills rather than as system-prompt edits inside one vendor.

Self-evolving skills — what SkillOpt actually does

SkillOpt is the most ambitious of the three. The pitch:

"A separate optimizer model turns scored rollouts into bounded add/delete/replace edits on a single skill document, and an edit is accepted only when it strictly improves a held-out validation score." (Yang et al., 2026-05-25)

In other words, SkillOpt treats the skill document the way deep learning treats weights: you have a "forward pass" (the agent runs with the current skill on a batch), a "backward pass" (an optimizer model reflects on what failed and proposes edits), a "learning rate" (a budget on how many edits per step), and a held-out validation gate. Edits are only accepted if validation score goes up.

The results table in the paper is striking: SkillOpt is best or tied-best in all 52 (target model, benchmark, harness) cells they tested, with average lifts of +23.5 points on GPT-5.5 in direct chat, +24.8 inside the Codex agentic loop, and +19.1 inside Claude Code, against the no-skill baseline (Yang et al.).

The honest read for a platform team: this is a research paper, not a shipped product, and the optimizer model and rollout cost are non-trivial. But it changes the conceptual picture — if your skill is a learnable artifact, the question stops being "is my skill well-written" and starts being "is my skill well-trained on the right rollouts." That shifts the engineering effort from prompt craft to evaluation infrastructure, which is exactly the shift our broader AI agent evaluation guide argues for.

The counter-take — "your agent skill is not automation"

Simon Green's piece is the necessary corrective. His thesis:

"An Agent Skill is a shortcut. A pattern. ... A reusable way of helping an AI perform a task in a familiar style or against a familiar standard. ... But most of the time, this is still personal leverage. It is not yet operational automation." (sjg.io, 2026-05-27)

Why this matters even if you disagree: skills, as defined by all three sources above, only fire when the agent decides they are relevant. They are not deterministic workflows. They do not have SLAs. They route through a human's judgment, prompt, and tolerance for risk. If you are pricing a product line, sizing a team, or building a process around skills, you need to know the difference between "individual contributor with leverage" and "system you can scale by adding compute."

Treat skills as a powerful scaffold layer, not as a substitute for a real workflow engine. If you need automation in the traditional sense — guaranteed execution, deterministic side-effects, audit trail — wire the agent into an orchestrator on top of skills, not as an alternative to them. Clawvard's execution-bottleneck analysis covers why this gap is harder to close than the prompt-engineering side.

FAQ

What is the difference between an agent skill and an MCP server?

A skill is a packaged unit of procedural knowledge (folders of instructions, scripts, and templates) that the agent loads when relevant. An MCP server is a protocol-driven service that exposes tools, resources, and prompts to an agent's harness. Skills describe how to think about a problem; MCP servers execute capabilities. They compose well together: skills can call MCP tools, and serious agent stacks ship both.

Are agent skills the same as prompts?

No, but they overlap. A prompt is single-shot text you send into a model call. A skill is a packaged bundle — usually a folder — that the agent activates conditionally, often including instructions, examples, scripts, and templates. Skills are prompts plus a routing decision plus optional code and assets. The Hugging Face glossary makes this distinction explicit (HF, 2026-05-25).

Do I need a scaffold if I use skills?

Yes — skills live inside the scaffold. The scaffold is the behavior-defining layer around the model (system prompt, tool descriptions, context management); a skill is one of the things the scaffold can pull in. You cannot have skills without a scaffold; you can have a scaffold without skills (lots of simple agents do).

Can agent skills evolve themselves?

In Microsoft Research's SkillOpt framework, yes — the skill document is treated as the trainable artifact, with an optimizer model proposing bounded edits and a held-out validation gate accepting or rejecting them (SkillOpt, arXiv:2605.23904). In most shipped agents today (Claude Code skills, AWS Well-Architected skills) the skills are static documents that humans maintain; "self-evolving" is still leading-edge research, not standard practice.

Which should I invest in this quarter — skills, MCP, or better prompts?

If your team is writing the same instruction set into many different agents, invest in skills first — AWS's 12-adapter pattern shows that pays off. If your bottleneck is "this capability isn't exposed to my agent at all," invest in MCP. If your bottleneck is "the agent doesn't know what good looks like for this task," start with a better prompt or skill and only add infrastructure once you have evals showing the gap.

Takeaways

Skills, scaffolds, and MCP are not competing concepts — they sit at different layers. Skills are units of procedural knowledge; scaffolds are the behavior layer that loads them; MCP is the protocol that exposes external capabilities.
AWS's Well-Architected repo is the first serious portability signal. One skill bundle, 12 coding agents. If you can write your team's playbooks in that shape, you are no longer locked into one harness.
SkillOpt is the research preview of skills as trainable artifacts. Not yet shipped, but it points to where the engineering investment moves next: from prompt craft to skill evaluation.
Don't confuse skills with automation. Skills are personal leverage; durable operational automation still needs an orchestrator on top.

If you're picking an agent platform this quarter, run the AVP captain's log #1 experiment in your own stack: same task, same model, three setups (skill, MCP, plain prompt). The numbers will tell you which abstraction your team's bottleneck actually needs — and the answer is rarely the one a vendor will sell you.

For ongoing coverage of the agent vocabulary as it settles, follow the Clawvard blog or try Clawvard to see how we wire skills, scaffolds, and MCP together in our own agent courses.