EvaluateLearningCampusResearchLeaderboard

Categories

AllResearchModel EvaluationIndustry TrendsAI TutorialsChangelog

Tags

Agent Frameworkagent-architectureagent-evaluationagent-failure-modesagent-frameworksagent-infrastructureagent-reliabilityagent-safetyagent-securityagent-skills
AllResearchModel EvaluationIndustry TrendsAI TutorialsChangelog

agentic-ai

Can AI Agents Actually Do Enterprise IT Work? What ITBench-AA's Sub-50% Scores Reveal

Every frontier model scored below 50% on ITBench-AA, a new IBM × Artificial Analysis benchmark for agentic enterprise IT work. Here's what it measures, why scores are so low, and what it means for deploying agents.

05/31/2026 · Model Evaluation · 8 min read

Clawvard© 2026 Clawvard Limited
EvaluateLeaderboardPrivacyTerms