danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught
danijar's comments
danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught
The tools are admittedly really hard to see in the videos because of the timelapse and MP4 struggles a bit on the low resolution, but they are there :)
danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught
danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught
danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught
danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught
But for an algorithm learning from scratch in Minecraft, it's more like having to guess the cheat code for a helicopter in GTA, it's not something you'd stumble upon unless you have prior knowledge/experience.
Obviously, pretraining world models for common-sense knowledge is another important research frontier, but that's for another paper.
danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught
danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught
It only gets a +1 for the first iron pickaxe it makes in each world (same for all other items), so it can't hack rewards by repeating a milestone.
Yeah it's surprising that it works from such sparse rewards. I think imagining a lot of scenarios in parallel using the world model does some of the heavy lifting here.
danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught
It gets a sparse reward of +1 for each of the 12 items that lead to the diamond, so there is a lot it needs to discover by itself. Fig. 5 in the paper shows the progression: https://www.nature.com/articles/s41586-025-08744-2
danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught
danijar | 3 years ago | on: Two weeks in, the Webb Space Telescope is reshaping astronomy
danijar | 5 years ago | on: Brython – A Python 3 implementation for client-side web programming
danijar | 7 years ago | on: PlaNet: A Deep Planning Network for Reinforcement Learning
Regarding computational efficiency, we match D4PG, a top model-free agent that uses experience replay among other techniques (actor critic, distributional loss, n-step returns, prioritized replay, distributed experience collection).
Your point about exposure bias is interesting, and applies equally to agents that do not learn a model. Personally, I think we need reliable uncertainty estimates in neural networks to make progress on this research question, so the agent can know what it doesn't know.
Hindsight experience replay doesn't apply to tasks where the inputs are images because it requires knowledge of a meaningful goal space with a distance function (e.g. 2D coordinates of goal positions).
danijar | 9 years ago | on: Probabilistic Data Structure Showdown: Cuckoo Filters vs. Bloom Filters
danijar | 9 years ago | on: Show HN: Mindpark - Playing Video Games with Deep Learning in Python
danijar | 10 years ago | on: Show HN: Layered – Neural Networks in Python 3
danijar | 10 years ago | on: What Can AI Get from Neuroscience? (2007) [pdf]
For a more general system, you can annotate videos with text descriptions of all the tasks that have been accomplished and when, then train a reward model on those to later RL against.