(no title)
bhu8 | 13 days ago
Can you actually introduce that into the decision making? That is, you would:
1. Have the LLM come up with N many potential actions
2. Run XMage run in parallel and provide the outcome for each different action
3. Revert XMage to the original state
4. Provide the LLM with the different outcomes and have them choose the action/outcome pair rather than just the action
This would actually help them analyze the counterfactual outcomes more effectively and should prevent 99% of the blunders
If you happen to be token rich, you could even do this in a MCTS manner and have them think really deep
No comments yet.