Context Engineering for AI Agents: Why Less Context Beats More Memory

If you build agents, you have probably felt it: the agent that breezes through a three-step task starts losing the thread on step twelve. It repeats itself, forgets a constraint it followed minutes ago, or confidently acts on something stale. The intuitive fix — give it more memory, stuff more history into the prompt — often makes things worse. Context engineering for AI agents is the discipline that takes that counter-intuitive lesson seriously, and in the last few days it has gone from practitioner folklore to a research thesis with real momentum.

On June 10, 2026, three independent signals landed on the same day and pointed in the same direction. Two arXiv papers argued, almost in their titles, that less context produces better and more accurate agents: "Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents" (arXiv:2606.10209) and "Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents" (arXiv:2606.09900). The same day, TechCrunch published an explainer headlined "How memory tools can make AI models worse." And a startup, Jedify, raised $24M specifically to help companies arm their agents with the right business context — a commercial bet that which context an agent gets is now a product category of its own. When research and the market converge on the same week, it is worth understanding why.

What is context engineering for AI agents?

Context engineering for AI agents is the practice of deciding — deliberately and continuously — what information an agent sees at each step of a task. It sits one level above prompt engineering. Prompt engineering asks, "How do I phrase this one instruction?" Context engineering asks, "Across a long, multi-step, tool-using run, what should be in the model's window right now, what should be retrieved on demand, and what should be dropped?"

That distinction matters because modern agents do not run a single prompt. They loop: call a tool, read the result, plan, call another tool, and so on. Every loop is a chance to add more text to the working context — tool outputs, prior reasoning, retrieved documents, conversation history. Left unmanaged, that pile grows until it crowds out the few facts that actually matter for the next decision. Context engineering is the set of choices that keep the signal-to-noise ratio high as the task gets longer.

Why does more memory sometimes make agents worse?

It sounds backwards. More information should mean better decisions. But for a language model operating over a finite, attention-weighted context window, more is not free.

The long-horizon context-bloat problem

The core failure mode is bloat. As an agent works through a long-horizon task, its context accumulates: stale tool results, abandoned plans, redundant retrievals, and earlier reasoning that no longer applies. The model has to attend across all of it. Relevant details get diluted by irrelevant ones, the most important instruction can drift far from the current decision point, and longer inputs cost more and respond more slowly. This is the practical reading behind the "Less Context, Better Agents" paper's framing for long-horizon, tool-using agents and behind TechCrunch's report that memory tools can degrade rather than improve model behavior: a memory system that simply remembers more is a memory system that buries the present under the past.

Bi-temporal memory: separating event time from ingestion time

One of the more interesting ideas in this wave is bi-temporal memory, the centerpiece of the "Less Context, More Accuracy" paper (arXiv:2606.09900). "Bi-temporal" is a long-standing concept in databases: every fact carries two timelines — when the fact was true in the world (event time) and when the system learned or recorded it (ingestion time). Tracking both lets a system answer "what did we believe on Tuesday?" separately from "what was actually true on Tuesday?"

For an agent, that separation is powerful. It lets the agent reason about currency and provenance instead of treating every remembered fact as equally valid right now. A price from last quarter and a price from this morning are not interchangeable; a fact the agent ingested before a correction should not silently override the correction. A bi-temporal memory engine gives the agent a principled way to prefer fresh, correctly-ordered information — which is exactly how you get "more accuracy" from less context: you keep what is currently true and relevant, not everything you have ever seen.

Efficient context for tool-using agents: what the new research shows

Read together, the two June 10 papers describe two complementary halves of the same strategy. The first half is reduction: efficient context engineering for long-horizon, tool-using agents means actively trimming the window — summarizing or discarding spent tool outputs, compacting prior reasoning, and retrieving detail only when a step needs it rather than holding everything in-context the whole time. The second half is organization: a bi-temporal memory engine structures what you do keep so the agent can tell new from old and recorded-belief from ground-truth.

The commercial signal rounds out the picture. Jedify's $24M raise (TechCrunch, June 10) is aimed at giving agents context about a company's business — the proprietary, current facts an agent needs to be useful in a real workflow. That is the same thesis from the enterprise side: value comes not from maximal memory but from the right, well-curated context delivered at the right moment. (Funding validates market demand for the problem; it is not, on its own, evidence that any single technique works — treat it as a signal of where the industry is investing.)

How do you design a context strategy that scales?

You do not need a research lab to apply the lesson. A practical context strategy for a long-horizon agent comes down to a handful of repeatable choices:

Budget the window deliberately. Decide roughly how much of the context is reserved for the task instruction, for working state, and for retrieved detail. Treat context as a scarce resource with an explicit allocation, not an append-only log.
Summarize and compact spent steps. Once a tool result has been used, replace the raw dump with a short, structured summary of what it told the agent. Keep the conclusion, drop the transcript.
Retrieve on demand, not just in case. Pull detailed documents or records into context at the step that needs them, and let them fall out afterward, rather than front-loading everything.
Make memory time-aware. Tag facts with when they were true and when you learned them. Prefer fresher facts, and let corrections supersede the beliefs they correct — the bi-temporal idea, applied pragmatically.
Prune aggressively, then measure. Remove stale plans and redundant history, and track accuracy, latency, and cost as you do. If trimming context holds accuracy steady while cutting cost and latency, you have found free margin.
Separate working state from long-term memory. What the agent needs for this step is not the same as what it should be able to recall later. Conflating the two is how windows bloat.

The throughline: design the agent to forget well, not just to remember more.

FAQ

When should an AI agent forget?

As soon as a piece of context has done its job and is no longer needed for the current or upcoming steps. A used tool result, a discarded plan, or a resolved sub-task can be summarized down to its conclusion or dropped. Forgetting is not data loss if the durable record lives in retrievable memory; it is keeping the working window focused on what the next decision actually requires.

How much context is too much?

There is no universal token number — it depends on the model and task — but the practical signal is behavioral: when accuracy plateaus or drops, latency climbs, and cost rises as you add history, you are past the useful point. The June 2026 research framing of "less context, better agents" is the rule of thumb. Treat growing context with declining returns as the symptom to watch.

Memory versus retrieval — which should come first?

Think of them as different jobs rather than competitors. Retrieval pulls in external knowledge on demand; memory persists the agent's own evolving state and learned facts. Start by getting retrieval disciplined (fetch what a step needs, when it needs it), then add structured, time-aware memory for the state that must survive across steps. The failure mode the new research warns about is bolting on an ever-growing memory and skipping the discipline.

Takeaways for Clawvard readers

The shift is subtle but real: the frontier of agent reliability is moving from "how much can the model remember?" to "how well can the system curate what the model sees?" Three same-day signals — two arXiv papers and a TechCrunch explainer, plus a fresh $24M raise aimed at business context — point the same way. If your agents degrade over long tasks, the highest-leverage fix is probably not a bigger memory store; it is a deliberate context strategy: budget the window, summarize spent steps, retrieve on demand, and make memory time-aware.

If you are building or evaluating long-horizon agents, this is a good week to revisit how your context pipeline is wired. Explore Clawvard for more on agent infrastructure and evaluation, and follow along as we track where context engineering goes next — and if a teammate is wrestling with an agent that forgets the plot, pass this along.