Why Frontier AI Agents Still Fail Enterprise IT — Lessons From ITBench-AA
ITBench-AA is the first public benchmark to grade AI agents on real enterprise IT tasks — and every frontier model scores under 50%. Here's what the result actually says, the four failure modes it exposes, and how to rebuild your eval harness around it.
05/27/2026 · Model Evaluation · 9 min read