AI Tutorials

Agent Harness vs Scaffold vs Skill: A Practical 2026 Glossary

May 27, 2026·10 min read·Updated May 27, 2026
Agent Harness vs Scaffold vs Skill: A Practical 2026 Glossary

Agent Harness vs Scaffold vs Skill: A Practical 2026 Glossary

Three engineers walk into a planning meeting and all three say the word "harness." None of them mean the same thing. That meeting, multiplied across every AI infrastructure team in 2026, is why the vocabulary keeps slipping — and why Hugging Face's recent agent glossary post (published 2026-05-25) was so widely shared.

The HF post is a good starting point, but the words are not yet settled, and every major vendor uses them slightly differently. This is the page you can send to the colleague who keeps confusing "scaffold" with "framework," or to the new hire who is trying to read OpenAI, Anthropic, and Google DeepMind docs in the same week and is quietly losing it.

Takeaways

  • Model = the weights. Harness = the runtime that turns the weights into an actor. Scaffold = the static skeleton that constrains how the actor thinks. Skill = a packaged capability the actor can call on. Tool = a single function the actor can call. Agent = the whole assembled thing the user talks to.
  • The five terms are not interchangeable, even when vendor marketing implies they are.
  • A useful mental model is the Harness × Scaffold × Skill triangle: harness is how, scaffold is what shape, skill is what it can do.
  • When in doubt, ask: "Does this thing run at every step (harness), shape the plan once (scaffold), or get called on demand (skill/tool)?" That single test resolves most arguments.
  • For a worked example of why this vocabulary matters operationally, see our ITBench-AA explainer — every failure mode in that benchmark is really a failure of one of these three.

Why the vocabulary keeps slipping

Three forces are pulling at the words at the same time:

  1. Marketing. Every vendor wants their product to be "the agent" — not "the scaffold for the agent" — because "agent" is what executives are buying.
  2. Engineering reality. The actual systems are layered. There is a model. There is a runtime around it. There is a skeleton inside the runtime. There are capabilities the runtime can invoke. Each layer needs a name, and the names were borrowed from software engineering, where they already had other meanings.
  3. Speed. Names are being coined faster than they are being adopted. Three months ago "skills" barely meant anything outside Alexa-era memory. Today Anthropic ships Claude Skills, Hugging Face writes a glossary about them, and the term means something specific.

The result: every team has a working dialect, and most teams do not realize their dialect differs from the next team's until they try to integrate.

Why does "agent" mean three different things this year?

Because three communities are using it at once:

  • The research community uses "agent" the way it has since the 1990s — a system that perceives and acts. Anything with a while loop and a tool call counts.
  • The product community uses "agent" to mean a user-facing assistant that completes whole tasks: Devin, Codex, Claude Code. The bar is "you can hand it a goal and walk away."
  • The platform community uses "agent" to mean a specific deployable artifact: a configured model + harness + scaffold + skills bundle that you can run in production.

All three are defensible. The trick is to notice which one your conversation partner is using and adjust accordingly.

A short history — from "chatbot" to "harness"

Tracing the vocabulary roughly:

  • 2022–2023: "chatbot" and "assistant." The model is the product; everything else is glue.
  • 2023–2024: "AI agent" gets adopted from research, mostly for tool-using systems. "Tools," "function calling," and "ReAct" become the shared vocabulary.
  • 2024–2025: "Framework" enters via LangChain, AutoGen, LangGraph, CrewAI. The framework is the thing you import; the agent is what you build with it.
  • 2025–2026: "Harness" and "scaffold" emerge to describe what the framework actually provides. Simon Willison's running commentary on Datasette Agent and on the current Anthropic/OpenAI product-market fit moment both lean on "harness" as a load-bearing word. Hugging Face publishes the glossary post.
  • 2026: "Skill" stabilizes as a named, distributable capability and starts to displace "tool" for higher-level reusable units.

We are about halfway through the cycle that took "container" from confusing to canonical.

The five terms that actually matter

These are the five you cannot get away without. Vendor docs will introduce more, but everything else is a specialization of one of these.

Model — the weights and nothing else

The model is the trained parameter file. It cannot take an action. It cannot read your filesystem. It cannot remember what you said yesterday. It can only produce the next token, conditioned on whatever you put in front of it. Everything else is added by the layers above.

One-line takeaway: if you can name a checkpoint or a snapshot, it is the model.

What is an agent harness?

A harness is the runtime that wraps a model and turns it into something that can act in a loop. The harness owns the inference call, the tool invocations, the planning loop, the memory writes, the audit trail. When the agent "tries something, sees the result, tries something else," the something is the model, but the tries is the harness.

Concretely, a harness typically includes:

  • The orchestration loop that decides when to call the model vs when to call a tool.
  • The tool-execution sandbox (subprocess, container, MCP server).
  • The state container — what gets remembered between steps, and where it lives.
  • The observability and audit hooks.

One-line takeaway: the harness runs at every step. If it stops, the agent stops.

What is an agent scaffold, and how is it different?

A scaffold is the static skeleton that shapes how the agent approaches the task. It is closer to a template than to a runtime. Where a harness is the engine, a scaffold is the chassis — the planner architecture, the prompt structure, the role decomposition, the policy that says "first reason, then act."

A scaffold answers questions like:

  • Is this a single-agent or multi-agent topology?
  • Does the agent plan once and execute, or replan after every step?
  • What is the prompt template that frames each turn?
  • Which roles exist (planner / executor / critic), and how do they hand off?

Crucially, a scaffold does not run on its own. It is configuration that the harness consumes. Swap a scaffold and the agent behaves differently. Swap a harness and the agent runs on a different substrate.

One-line takeaway: the scaffold shapes the shape of the agent's behavior; the harness executes it.

What's the difference between a skill and a tool?

A tool is a single callable — a function the agent can invoke with structured arguments. read_file(path), kubectl_apply(yaml), search_web(query). Tools are small, fast to describe, and usually maintained by whoever owns the underlying API.

A skill is a packaged capability that bundles instructions, tools, and often state into a higher-level unit. Anthropic's framing of "skills" is the cleanest current example: a skill ships a description of when to use it, a set of files and tools, and the conventions for invoking them. From the model's perspective, a skill feels like a tool — but from the platform's perspective, a skill is a distributable artifact you can install, version, and govern.

One-line takeaway: every skill is built out of tools; not every tool is a skill.

Agent — the assembled thing the user actually talks to

An agent is what you get when you put a model into a harness, give it a scaffold, attach some skills/tools, and point it at a goal. It is the deployable unit. It is what an enterprise buyer is signing a contract for. It is what shows up on the org chart with a name like "Codex Enterprise" or "Datasette Agent."

One-line takeaway: if it has a name on a marketing page, it is an agent. If it has a name in your codebase, it is probably a harness, scaffold, or skill.

How major platforms use these terms (and where they conflict)

Vendor vocabulary is converging, but not converged. Here is how to read each.

OpenAI's vocabulary (Codex, Agents API)

OpenAI talks about agents (the product), the Agents API (their harness-as-a-service), tools (function-calling), and increasingly skills in the broader sense. "Scaffold" is rarely a load-bearing word in their docs; the scaffolding is implicit in the SDK shape. Their enterprise drumbeat — Codex pilots at Virgin Atlantic, Codex code review at Ramp, the Dell + Codex partnership, and being named a Leader in Gartner's 2026 agentic-coding report — frames "agent" as the user-facing assistant, with the harness mostly invisible.

Anthropic's vocabulary (Claude Code, Skills)

Anthropic ships "Skills" as a named primitive, which makes "skill" much more concrete in their world than in most others. Claude Code is the agent; the harness sits behind it; "scaffold" is again mostly implicit. When Anthropic docs say "skill," they mean the distributable packaged capability — closer to "Anthropic's notion of a plugin" than to "a verb the model can do."

Google DeepMind & Antigravity

Google's I/O 2026 rebrand — captured in Sundar Pichai's keynote post — leaned into "the agentic Gemini era." Antigravity, the new platform layer, is closer to a harness-plus-scaffold than to an agent in the product sense. Google's docs tend to use "agent" the research-community way (anything that perceives and acts), so expect to translate when reading them next to Anthropic's.

Open source — LangGraph, AutoGen, Datasette Agent

Open source is where "harness" got load-bearing. LangGraph and AutoGen explicitly distinguish the graph/runtime from the roles and prompts. Simon Willison's Datasette Agent is a useful real-world reference because it makes the layering legible: the model is swappable; the harness is the Datasette plugin; the scaffold is the prompt+role design; the skills are the data-exploration tools. If you want to see all five terms cleanly separated in a small, readable codebase, this is the one to read.

A reusable mental model: the Harness × Scaffold × Skill triangle

When the words start sliding around in a meeting, draw three corners:

  • Harness (runtime — what runs at every step).
  • Scaffold (skeleton — what shape the behavior takes).
  • Skill (capability — what the agent can do).

Every architectural question is really asking which corner you are modifying. "Should we add memory?" — that is a harness change. "Should we switch from a single planner to a planner+critic?" — that is a scaffold change. "Should we let the agent file Jira tickets?" — that is a skill change.

The corners interact: a new skill may require harness changes to sandbox it, and a new scaffold may require new skills to fill in roles. But naming the corner first makes the dependency conversation tractable.

Which one am I actually building?

Useful gut check:

  • If you are writing code that runs at every step of the agent's loop — you are building the harness.
  • If you are writing prompts, role templates, or planner/executor topologies — you are building the scaffold.
  • If you are wrapping an API or a small workflow so the agent can call it on demand — you are building a skill (or, if it is one function, a tool).
  • If you are putting all of those together and shipping the result with a product name — you are building an agent.

Picking the right term in three common scenarios

You're writing a job description for an "agent engineer"

"Agent engineer" is too broad to hire against. Decide whether you want someone who:

  • Builds the runtime (harness engineer — strong systems / distributed background).
  • Designs the topology and prompt structure (scaffold engineer — strong evals / cognitive-architecture instinct).
  • Builds and maintains skills (skill engineer — strong API integration and security background).

The hiring pipeline, the interview rubric, and the team they will sit on are all different.

You're scoping an internal agent platform

If your scoping doc says "the platform team will build the agent," push back. The platform team builds the harness (and probably the skill registry). Product teams build the agents. Conflating the two is how platform roadmaps end up with twenty bespoke agents nobody owns and a harness nobody trusts.

You're reading a vendor SOC2 / security review

Read every "agent" in the document and mentally substitute one of: model / harness / scaffold / skill / tool. If the substitution changes the meaning of the control — and it usually does — flag it for clarification. Most agent-platform security gaps live in the harness layer (sandboxing, audit, blast radius), not in the model.

FAQ

Is a "harness" the same as a "runtime"?

Close, but a harness is the agent-specific runtime — the loop, the tool invocation, the state container. A general-purpose runtime (a Python interpreter, a container scheduler) is a substrate the harness runs on. "Harness" implies the agent-aware parts; "runtime" is the more generic word.

Are MCP servers tools or skills?

Both, depending on shape. A small MCP server that exposes one or two functions is closer to a tool. A larger MCP server that ships instructions, configuration, and a coherent capability bundle is closer to a skill. The distinction is not about the protocol; it is about whether the package includes its own usage convention.

What's a "scaffold" in agent context vs in software engineering generally?

In general software engineering, "scaffolding" means generated boilerplate you fill in (think rails generate). In agent context, the scaffold is the static structure the harness operates inside — closer in spirit to an architectural pattern than to a CLI command. Same metaphor, different layer.

Will these terms still mean the same thing in 2027?

Probably the five we listed will stabilize. Expect "skill" and "scaffold" to keep drifting for another twelve months. Expect "agent" to keep meaning three different things forever — that one is not going to settle, so build the muscle of asking which meaning your interlocutor has in mind.


Keep reading

  • Why frontier AI agents still fail enterprise IT — lessons from ITBench-AA — the worked example: each of the four failure modes is a different harness/scaffold/skill problem.
  • Try Clawvard if you want to build an agent without re-inventing every layer of this stack from scratch.
  • If this glossary clarified an argument on your team, share it — that's the highest-leverage thing you can do with a reference page.

Editor notes / uncertainty

  • I deliberately did not invent vendor doc URLs. Vendor sections (OpenAI, Anthropic, Google) describe the conceptual vocabulary based on the Scout digest's public sources only. If the Editor wants direct vendor-doc citations inline (Claude Skills page, OpenAI Agents API page, Antigravity launch page), please confirm canonical URLs and I can wire them in.
  • The phrase "Anthropic's framing of 'skills'" intentionally avoids quoting a launch page I can't verify; happy to upgrade to a direct cite once a canonical URL is confirmed.
  • Reciprocal internal link to /blog/agentic-enterprise-it-benchmark-itbench-aa is wired in two places (Takeaways and Keep reading), as briefed.

Related Articles