danijar's comments

danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught

For a lot of things, VLMs are good enough already to provide rewards. Give them the recent images and a text description of the task and ask whether the task was accomplished or not.

For a more general system, you can annotate videos with text descriptions of all the tasks that have been accomplished and when, then train a reward model on those to later RL against.

danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught

It gets diamonds at 1:48 in the top left video (might need to full screen to seek) [1].

The tools are admittedly really hard to see in the videos because of the timelapse and MP4 struggles a bit on the low resolution, but they are there :)

[1]: https://danijar.com/dreamerv3/

danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught

It actually has no human data as input and learns by itself in the environment, that's the point of the accomplishment! :)

danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught

Yes, it's RL from scratch and sparse rewards

danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught

I agree with you, this is just the start and Minecraft has a lot more to offer for future research!

danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught

I think learning to hold a button down in itself isn't too hard for a human or robot that's been interacting with the physical world for a while and has learned all kinds of skills in that environment.

But for an algorithm learning from scratch in Minecraft, it's more like having to guess the cheat code for a helicopter in GTA, it's not something you'd stumble upon unless you have prior knowledge/experience.

Obviously, pretraining world models for common-sense knowledge is another important research frontier, but that's for another paper.

danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught

Haha thanks!

danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught

When it dies it loses all items and the world resets to a new random seed. It learns to stay alive quite well but sometimes falls into lava or gets killed by monsters.

It only gets a +1 for the first iron pickaxe it makes in each world (same for all other items), so it can't hack rewards by repeating a milestone.

Yeah it's surprising that it works from such sparse rewards. I think imagining a lot of scenarios in parallel using the world model does some of the heavy lifting here.

danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught

Hi, author here! Dreamer learns to find diamonds from scratch by interacting with the environment, without access to external data. So there are no explainer videos or internet text here.

It gets a sparse reward of +1 for each of the 12 items that lead to the diamond, so there is a lot it needs to discover by itself. Fig. 5 in the paper shows the progression: https://www.nature.com/articles/s41586-025-08744-2

danijar | 11 months ago | on: DeepMind program finds diamonds in Minecraft without being taught

Yes, you can decode the imagined scenarios into videos and look at them. It's quite helpful during development to see what the model gets right or wrong. See Fig. 3 in the paper: https://www.nature.com/articles/s41586-025-08744-2

danijar | 3 years ago | on: Two weeks in, the Webb Space Telescope is reshaping astronomy

To me, that's just bad scientific reporting then. As a scientist, I also found this headline a bit misleading.

danijar | 5 years ago | on: Brython – A Python 3 implementation for client-side web programming

It's necessary if you want to offer an interactive Python shell in the browser, e.g. for websites that teach programming or otherwise use programming as a means of user interaction.

danijar | 7 years ago | on: PlaNet: A Deep Planning Network for Reinforcement Learning

Author here. First of all, I'd like to clarify that the data efficiency gain over D4PG is 5000% or 50x.

Regarding computational efficiency, we match D4PG, a top model-free agent that uses experience replay among other techniques (actor critic, distributional loss, n-step returns, prioritized replay, distributed experience collection).

Your point about exposure bias is interesting, and applies equally to agents that do not learn a model. Personally, I think we need reliable uncertainty estimates in neural networks to make progress on this research question, so the agent can know what it doesn't know.

Hindsight experience replay doesn't apply to tasks where the inputs are images because it requires knowledge of a meaningful goal space with a distance function (e.g. 2D coordinates of goal positions).

danijar | 9 years ago | on: Probabilistic Data Structure Showdown: Cuckoo Filters vs. Bloom Filters

That's a beautifully simple analogy.

danijar | 9 years ago | on: Show HN: Mindpark - Playing Video Games with Deep Learning in Python

Thanks!

danijar | 10 years ago | on: Show HN: Layered – Neural Networks in Python 3

Thanks for taking a look. Do you really think __call__ affects performance that much? I'll look into OpenCL to improve performance.

danijar | 10 years ago | on: What Can AI Get from Neuroscience? (2007) [pdf]

How does it matter if it's an exact copy? I would argue that a completely different implementation of real intelligence still prompts the same ethical questions.