Skip to content
AI IntelligenceJul 4, 2026AI Intelligence
Article

The UK's AI Security Institute finds that standard benchmarks systematically underestimate what AI agents can actually do.

On software engineering tasks, success rates jumped about 25% with more compute.

Data Cube AI EditorialSource: The Decoder
01

Source Brief

The UK's AI Security Institute finds that standard benchmarks systematically underestimate what AI agents can actually do. On software engineering tasks, success rates jumped about 25% with more compute.