LLM benchmark · 300 puzzles · 6 models
Evaluating LLM chess puzzle performance across difficulty tiers and mate types