AI IntelligenceJun 27, 2026AI Intelligence
Article
Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code.
Claude Opus 4.7 leads with a 56% solve rate, rebuilding a 16,000-line toolkit in just 14 hours. However, all tested models still fail on complex tasks.
Data Cube AI EditorialSource: The Decoder
01
Source Brief
Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56% solve rate, rebuilding a 16,000-line toolkit in just 14 hours. However, all tested models still fail on complex tasks.