top | item 43609552

(no title)

ks1723 | 10 months ago

I my view, the 'exactly' is crucial here. They do implicitly tell the model what to do by encoding it in the reward function:

In Minecraft, the team used a protocol that gave Dreamer a ‘plus one’ reward every time it completed one of 12 progressive steps involved in diamond collection — including creating planks and a furnace, mining iron and forging an iron pickaxe.

This is also why I think the title of the article is slightly misleading.

discuss

order

wongarsu|10 months ago

It's kind of fair, humans also get rewarded for those steps when they learn Minecraft

xwolfi|10 months ago

But they don't learn that way at all, my 7yo learns by watching youtubers. There's a whole network of people teaching each other the game, that's almost more fun than playing it alone.