Skip to content
AI IntelligenceJun 27, 2026AI Intelligence
Article

Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code.

Claude Opus 4.7 leads with a 56% solve rate, rebuilding a 16,000-line toolkit in just 14 hours. However, all tested models still fail on complex tasks.

Data Cube AI EditorialSource: The Decoder
01

Source Brief

Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56% solve rate, rebuilding a 16,000-line toolkit in just 14 hours. However, all tested models still fail on complex tasks.