Three Ways to Self-Critique
Forced iteration reveals distinct behavioral phenotypes: Claude reflects deeply but changes little, Gemini reflects shallowly but changes everything.
When forced to critique their own work and rebuild, models do it in fundamentally different ways. Claude writes structured, bulleted self-assessments. It names specific technologies it defaulted to (97% of sessions). It pivots concept but stays close to its comfort zone. Canvas 2D drops only 18 percentage points.
Gemini barely articulates what was wrong. Its critique exists in internal thinking traces, describing the process of critiquing rather than the substance. Then it deletes everything and starts from scratch. Canvas 2D drops 65 percentage points.
The finding: iteration effectiveness is inversely correlated with critique depth. The model that reflects least changes most. The mechanism is mechanical (starting over), not cognitive (understanding why).