Skip to content
AI IntelligenceJul 3, 2026AI Intelligence
Article

The UK AI Security Institute found that standard AI benchmarks systematically underestimate the actual capabilities of AI agents...

By capping compute budgets. On software engineering tasks, success rates jumped about 25% when agents were given more computing time. This raises questions about the validity of current safety tests.

Data Cube AI EditorialSource: The Decoder
01

Source Brief

The UK AI Security Institute found that standard AI benchmarks systematically underestimate the actual capabilities of AI agents by capping compute budgets. On software engineering tasks, success rates jumped about 25% when agents were given more computing time. This raises questions about the validity of current safety tests.