top | item 47057759

(no title)

bhu8 | 13 days ago

This is amazing. I checked some games and the blunders make me think that the LLMs are not really great at forecasting what happens if they play X on Y.

Can you actually introduce that into the decision making? That is, you would:

1. Have the LLM come up with N many potential actions

2. Run XMage run in parallel and provide the outcome for each different action

3. Revert XMage to the original state

4. Provide the LLM with the different outcomes and have them choose the action/outcome pair rather than just the action

This would actually help them analyze the counterfactual outcomes more effectively and should prevent 99% of the blunders

If you happen to be token rich, you could even do this in a MCTS manner and have them think really deep

discuss

order

No comments yet.