Can AI Agents Actually Do Enterprise IT Work? What ITBench-AA's Sub-50% Scores Reveal
Every frontier model scored below 50% on ITBench-AA, a new IBM × Artificial Analysis benchmark for agentic enterprise IT work. Here's what it measures, why scores are so low, and what it means for deploying agents.
05/31/2026 · Model Evaluation · 8 min read