Skip to content
AI IntelligenceFeb 23, 2026AI Intelligence
Article

OpenAI calls for the retirement of the popular AI coding benchmark SWE-bench Verified.

The company argues that most tasks are flawed and leading AI models have likely already seen the answers in their training. Thus, the benchmark measures memorization rather than real coding ability.

Data Cube AI EditorialSource: The Decoder
01

Source Brief

OpenAI calls for the retirement of the popular AI coding benchmark SWE-bench Verified. The company argues that most tasks are flawed and leading AI models have likely already seen the answers in their training. Thus, the benchmark measures memorization rather than real coding ability.