top | item 47052596

(no title)

HanClinto | 14 days ago

I've wondered about such things, and it feels like the 17 Lands dataset might be a good place to scrape play-by-play game data between human players. Feels like it could be adapted to a format usable by this structure, and used as a fine-tuning dataset.

discuss

GregorStocks|14 days ago

Oh, fascinating - I didn't realize they released actual replay data publicly. It doesn't look like it's quite as rich as I'd like, though - it only captures one row per turn, so I don't think you can deduce things like targeting, the order in which spells are cast, etc.

(I also thought about pointing it at my personal game logs, but unfortunately there aren't that many, because I'm too busy writing analysis tools to actually play the game.)

HanClinto|13 days ago

Another thing that I've thought about doing is to use some sort of computer vision to watch streamers of online games and use STT to capture not just play datasets, but also datasets of their narrated reasoning about why they play what they play.

Would be a lot of work to go through and use computer vision and some measure of reasoning to create these datasets, but some players do an excellent job of narrating their reasoning for their players (thinking of players like Cheon or LSV), so would be fascinating.

Caleb Gannon [0] is one such streamer who does a good job of narrating his plays, and he's also a computer scientist who is very interested in machine-learning projects (he's done several of his own). If you contacted him, I could definitely see him being willing to consent to his videos being used as a fine-tuning dataset for such purposes.

I would be willing to help with creating this dataset if you helped me understand what you would like to see in the final output format.

[0] - https://www.youtube.com/watch?v=YmAAK3V13b0

HanClinto|13 days ago

I believe it's even possible to match up game IDs so that (hypothetically) if both players are using 17 Lands, then you can match up a game from both sides and get full information re: the hands of each player as well.

It obviously wouldn't be the full set of games (because not everyone uses 17 lands), but it would certainly be a nonzero dataset.

perfect_wave|13 days ago

Ryan Saxe did exactly this a number of years ago: https://github.com/RyanSaxe/mtg