How to Build Agent Skills: A Practitioner's Guide

If you've spent any time around AI agents lately, you've felt the gap: a general-purpose agent is impressive in a demo and frustrating in production. It can do a little of everything and none of it reliably. Agent skills are the fix — focused, reusable packages of instruction and tooling that turn a generalist into a dependable specialist for one job. Getting good at writing them is quickly becoming a core practitioner skill in its own right.

The interest is spiking right now for good reason. In early June 2026, a single Show HN post — "My Agent Skill for Test-Driven Development" — drew hundreds of points and a long comment thread, a clear signal that developers want concrete patterns, not theory. In the same window, Hugging Face shipped two pieces that round out the picture: a guide to designing the hf CLI as an agent-optimized interface and "Harness, Scaffold, and the AI Agent Terms Worth Getting Right", a vocabulary explainer. There's even a community push toward standard schemas, like Open Envelope's open schema for defining AI agent teams. The craft is maturing — this guide is a practitioner's take on doing it well.

What are agent skills, exactly?

An agent skill is a self-contained unit that teaches an agent how to do one well-defined job: a set of instructions, plus the tools and context the agent needs, plus (ideally) a way to verify the work. Think of it as the difference between hiring a smart generalist and handing that generalist a written runbook for a specific task — the runbook is the skill.

A good skill answers three questions for the agent:

When should I use this? (a crisp trigger / description)
How do I do this job? (step-by-step instructions, conventions, gotchas)
What tools do I have, and how do I know I succeeded? (the tool surface and a success check)

The magic isn't in the model — it's in the packaging. The same model with a sharp, well-scoped skill will outperform a smarter model winging it.

Harness vs. scaffold: getting the vocabulary right

Before building, fix the terms — because the field uses them loosely and it causes real confusion. Hugging Face's agent glossary is the reference worth internalizing. In practical use:

Harness = the runtime loop that runs the agent — the thing that feeds the model prompts, executes its tool calls, captures results, and loops. It's the engine.
Scaffold = the structure and starting material you put around a task — templates, file layouts, boilerplate, and the surrounding setup that gives the agent something to build on.

Why does this matter for skills? Because a skill lives in the harness (it shapes how the loop behaves for a given job) but often ships scaffold (templates, file conventions, example structures). Knowing which layer you're editing keeps you from, say, hard-coding task-specific scaffold into your harness where it doesn't belong.

Why do agent skills matter now?

Two forces converged. Agents got genuinely capable enough to trust with multi-step work — but capability without structure just means confidently wrong at scale. As we found testing 45,000 AI agents, the bottleneck isn't raw intelligence, it's execution: staying on task, using tools correctly, and finishing reliably. Skills are exactly the lever that closes that execution gap. They're how you encode "do it this way, every time" so the agent doesn't rediscover (and re-botch) the process on every run.

How do you build an agent skill?

Here's a repeatable pattern. Four moves.

1. Scope it to exactly one job

The most common mistake is building a skill that does too much. "Manage the database" is not a skill; "create a backwards-compatible migration from a schema diff" is. A tightly scoped skill is easier to write, easier to test, and far more reliable — and you can compose several narrow skills into a workflow later. If you can't describe the job in one sentence, split it.

2. Write the instructions like you're onboarding a new teammate

Don't write for a machine; write for a competent colleague who's never seen your codebase. Be explicit about conventions, the order of operations, and the traps. Include the why behind non-obvious steps — agents, like people, follow rules better when they understand the reason. State what not to do as clearly as what to do.

3. Give it the right tool surface

A skill is only as good as the tools it can reach, and tools designed for humans often frustrate agents. This is the heart of Hugging Face's hf CLI for agents write-up: an agent-optimized interface favors predictable, scriptable commands; structured (e.g. JSON) output; clear error messages; and stable flags over interactive prompts. When you wire tools into a skill, prefer the agent-friendly surface — a clean CLI with --output json beats a chatty interactive wizard every time.

4. Make it testable from the start

This is the lesson of the TDD agent-skill that lit up Hacker News: bake verification into the skill. Have the skill write or run tests, check exit codes, and confirm the result against explicit acceptance criteria before declaring success. An agent that can check its own work is dramatically more reliable than one that just asserts "done." Test-driven development isn't only a coding discipline here — it's the closest thing agents have to a built-in truth signal.

How do you test an agent skill?

Treat the skill like any other piece of software:

Golden tasks. Keep a small suite of representative jobs with known-good outcomes, and run the skill against them whenever you change it.
Adversarial inputs. Throw the messy, ambiguous, and edge-case versions at it — that's where skills quietly fail.
Check the trajectory, not just the answer. A skill that reaches the right result by an expensive or fragile path will break later. Watch how it gets there.
Watch for regressions when you compose. Skills that work alone can interfere when chained; test them in the workflows you'll actually run.

How do you scope and name a skill well?

Naming is design. A skill's name and one-line description are what the agent reads to decide whether to invoke it at all, so they have to be unambiguous. Favor a verb-plus-object name ("generate-release-notes," "review-sql-migration") and a description that states the trigger condition precisely. If two skills have overlapping descriptions, the agent will pick wrong — disambiguate them or merge them. Standard schemas like Open Envelope point in this direction: explicit, machine-readable definitions of what each agent and skill is for.

Common mistakes when building agent skills

Too broad. One skill trying to be a whole department. Split it.
No success check. The skill claims completion it can't verify. Add a test.
Human-only tools. Wiring in interactive CLIs the agent can't drive cleanly. Prefer scriptable, JSON-emitting tools.
Implicit conventions. Assuming the agent knows your house style. Write it down.
Skipping the vocabulary. Confusing harness and scaffold leads to skills that fight the runtime. Get the terms right first.

Takeaways

Building good agent skills is less about prompt cleverness and more about software discipline: scope tightly, document like you're onboarding a teammate, expose agent-friendly tools, and bake in a way for the agent to check its own work. The June 2026 surge of interest — the viral TDD skill, the hf-CLI-for-agents design notes, the push for standard schemas — all point the same way. The teams that treat skills as real engineering artifacts, versioned and tested, are the ones whose agents actually ship.

Want to see which agent setups execute reliably and which fall apart under real tasks? Clawvard evaluates agents on exactly that. Read next: how to control AI coding agent costs in 2026, and our deep dive on why execution — not intelligence — is the real agent bottleneck.