Essay · April 28, 2026

Your AI Code Reviewer Can't Be Your AI Code Generator

How Anthropic's April 23 post-mortem made the case against same-vendor verification.

Alexander Theruviparambil— Founder9 min read

Last week, Anthropic published a detailed post-mortem explaining three product-level bugs that silently degraded Claude Code from March 4 to April 20, 2026. Most of the coverage focused on the obvious story: three bugs, seven weeks, an apology, a usage-limits reset. But there is a single sentence in the document, buried in the middle of the second incident's investigation, that does more architectural work than the rest of the post-mortem combined.

When we used Code Review against the offending pull requests, Opus 4.7 found the bug, while Opus 4.6 didn't.

Anthropic, An Update on Recent Claude Code Quality Reports (April 23, 2026)

Read it slowly. Notice what it concedes.

Anthropic, the company that wrote the bug-introducing code and runs some of the most rigorous internal AI evaluation processes in the industry, used their own code review product on the offending pull requests to figure out what had happened. The result depended on which model the verifier ran. Opus 4.7 surfaced the bug. Opus 4.6, the same model running in production at the time in Claude Code itself, did not.

The system that built the bug, run by the same vendor using its contemporary production model, did not catch its own bug. A different model from the same vendor, on the same code, did.

This is the post-mortem's most overlooked admission. The narrower reading is fair on its face: Opus 4.7 is simply more capable than 4.6, and the result tells us something about capability, not about same-vendor priors. The 4.6 vs 4.7 datapoint alone does not prove correlated blind spots; it is consistent with general capability gain. But the post-mortem doesn't stop there. The remediation list Anthropic publishes a few sections later (more on which below) reads as a checklist of practices that import outside perspective, not as "swap to a smarter model." The 4.6 vs 4.7 sentence is the empirical hook; the remediation list is the structural admission. Together they make the broader case.

Your AI Code Reviewer Can't Be Your AI Code Generator

What the post-mortem actually says

The architectural pattern

Why this generalizes across vendors

What independent looks like in practice

What we're building

Don't grade your own homework