AI IntelligenceFeb 23, 2026AI Intelligence
Article
OpenAI calls for the retirement of the popular AI coding benchmark SWE-bench Verified.
The company argues that most tasks are flawed and leading AI models have likely already seen the answers in their training. Thus, the benchmark measures memorization rather than real coding ability.
Data Cube AI EditorialSource: The Decoder
01
Source Brief
OpenAI calls for the retirement of the popular AI coding benchmark SWE-bench Verified. The company argues that most tasks are flawed and leading AI models have likely already seen the answers in their training. Thus, the benchmark measures memorization rather than real coding ability.
02