top | item 44796161

Show HN: I made a tiny, playable benchmark where LLMs compete head-to-head

2 points| yz-yu | 6 months ago |llm-fighter.com

TL;DR: LLM Fighter is a small, open-source, playable benchmark for agentic behavior. You bring an OpenAI-compatible API; the demo runs in the browser. It creates head-to-head “battles” that stress tools, planning, and efficiency, and shows step-by-step logs you can download.

What it does well: quick, honest feel for how agents act under the same rules. What it’s not: a formal academic benchmark or a single “score”. Why I built it: I wanted something you can play in minutes and still learn from.

discuss

No comments yet.