WingNews logo WingNews
top | new | best | ask | show | jobs
top | item 46700150

(no title)

zone411 | 1 month ago

For people interested in these kinds of benchmarks, I have two multiplayer, multi-round games:

- Elimination Game Benchmark: Social Reasoning, Strategy, and Deception in Multi-Agent LLM Dynamics at https://github.com/lechmazur/elimination_game/

- Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure at https://github.com/lechmazur/step_game/

discuss

order

No comments yet.

powered by hn/api // news.ycombinator.com