Initial Generation
Each AI model receives an identical prompt and generates code in a single attempt. This output is preserved as-is, demonstrating the model's baseline coding ability without iteration.
Evaluating LLM Coding Through Interactive Games
Moving beyond abstract benchmarks, we visualize real AI coding capabilities through game implementations.
Zero human intervention. Authentic code generation tested live.
Each test consists of two rounds: initial code generation and self-correction
Loading benchmark data...
Each AI model receives an identical prompt and generates code in a single attempt. This output is preserved as-is, demonstrating the model's baseline coding ability without iteration.
Models receive error feedback and attempt fixes for up to 3 iterations. This phase evaluates debugging skills and the ability to iterate toward working code in real development scenarios.