I've wondered about such things, and it feels like the 17 Lands dataset might be a good place to scrape play-by-play game data between human players. Feels like it could be adapted to a format usable by this structure, and used as a fine-tuning dataset.
GregorStocks|14 days ago
(I also thought about pointing it at my personal game logs, but unfortunately there aren't that many, because I'm too busy writing analysis tools to actually play the game.)
HanClinto|13 days ago
Would be a lot of work to go through and use computer vision and some measure of reasoning to create these datasets, but some players do an excellent job of narrating their reasoning for their players (thinking of players like Cheon or LSV), so would be fascinating.
Caleb Gannon [0] is one such streamer who does a good job of narrating his plays, and he's also a computer scientist who is very interested in machine-learning projects (he's done several of his own). If you contacted him, I could definitely see him being willing to consent to his videos being used as a fine-tuning dataset for such purposes.
I would be willing to help with creating this dataset if you helped me understand what you would like to see in the final output format.
[0] - https://www.youtube.com/watch?v=YmAAK3V13b0
HanClinto|13 days ago
It obviously wouldn't be the full set of games (because not everyone uses 17 lands), but it would certainly be a nonzero dataset.
perfect_wave|13 days ago