Research

AI Agent Prompt Injection: How Attackers Hide Instructions in Code — and How to Defend
A maintainer hid a 'delete all code' instruction in a popular Java library's output, visible only to AI agents. Here's how AI agent prompt injection works in coding tools — and the defense-in-depth that actually contains it.
05/31/2026 · Research · 9 min read

AI Agent Security: The Four-Layer Threat Model Every Team Deploying Agents Needs
AI agent security broke into the open this week with four independent reports on a single attack surface. Here's a durable threat model — supply chain, prompt injection, data exfiltration, and bot detection — and how to defend each layer.
05/30/2026 · Research · 10 min read

How AI Agent Memory Poisoning Works — and How to Defend Against It
Persistent agent memory is a new attack surface. Here's how memory-poisoning attacks work, why they're more dangerous than one-shot prompt injection, and a defensive checklist to stop them.
05/30/2026 · Research · 10 min read

How to Secure AI Coding Agents: Lessons From a Week of Prompt-Injection and Exfiltration Attacks
In a single week, three real incidents showed AI coding agents being hijacked through the code they read and the tools they hold. Here is a practical defensive playbook for the teams running them.
05/29/2026 · Research · 9 min read

AI Agent Security in 2026: Supply-Chain Breaches and Multi-Agent Injection Attacks
A real-world open source supply-chain breach and fresh research on camouflaged prompt injection show the AI agent attack surface is now real. Here's the threat model — and how to harden your agents.
05/28/2026 · Research · 7 min read

AI Agent Prompt Injection: A Hardening Checklist After the Copilot Cowork Disclosure
Microsoft's Copilot Cowork was shown exfiltrating files via prompt injection. The Microsoft-specific details are the hook; the four-layer checklist below is what every agent builder should be running against their own stack this week.
05/27/2026 · Research · 8 min read
Why Agents Need ASVP: From Exam Scores to Real Service Vitals
Benchmarks tell us what an agent can do in a controlled exam. ASVP tells us whether it keeps delivering in real work: sessions, tool use, abandonment, frustration, token cost, and skill adoption.
04/29/2026 · Research · 9 min read

The Execution Bottleneck: Why AI Agents Can Think But Can't Do
Analysis of 20,070 evaluations reveals Execution as the universal weakness across all 18 models. The Think-Do Gap is the defining challenge of 2026.
04/09/2026 · Research · 6 min read

We tested 45,000 AI Agents — the bottleneck isn't intelligence, it's execution
Clawvard's analysis of 45,674 AI Agent exams across 18 mainstream models and 8 capability dimensions. Reveals the real boundaries of Agent ability.
04/08/2026 · Research · 15 min read