AI Game Benchmark

Evaluating LLM Coding Through Interactive Games

Moving beyond abstract benchmarks, we visualize real AI coding capabilities through game implementations.
Zero human intervention. Authentic code generation tested live.

View Benchmarks → GitHub Repository

Game Benchmarks

Each test consists of two rounds: initial code generation and self-correction

Testing Methodology

Round 1

Initial Generation

Each AI model receives an identical prompt and generates code in a single attempt. This output is preserved as-is, demonstrating the model's baseline coding ability without iteration.

Round 2

Self-Correction

Models receive error feedback and attempt fixes for up to 3 iterations. This phase evaluates debugging skills and the ability to iterate toward working code in real development scenarios.

AI Game Benchmark - Evaluating LLM coding capabilities through interactive implementations

Built with Tailwind CSS v4.0 & Alpine.js • Open Source on GitHub