Deep Dreams with Caffe

[+] karpathy|10 years ago|reply

One thing people might not realize (I'm not sure how obvious it is) is that these renders depend strongly on the statistics of the training data used for the ConvNet. In particular you're seeing a lot of dog faces because there is a large number of dog classes in the ImageNet dataset (several hundred classes out of 1000 are dogs), so the ConvNet allocates a lot of its capacity to worrying about their fine-grained features.

In particular, if you train ConvNets on other data you will get very different hallucinations. It might be interesting to train (or even fine-tune) the networks on different data and see how the results vary. For example, different medical datasets, or datasets made entirely of faces (e.g. Faces in the Wild data), galaxies, etc.

It's also possible to take Image Captioning models and use the same idea to hallucinate images that are very likely for some specific sentence. There are a lot of fun ideas to play with.

[+] unknown|10 years ago|reply

[deleted]

[+] amelius|10 years ago|reply

So how much computational effort would it take to train with a different set of images, to reach the same level of training as this existing data?

Would it be possible on a simple commercial computer?

[+] zan2434|10 years ago|reply

http://deepdreams.zainshah.net spun up a simple web server so you can try your own! Please be gentle :)

[+] bemmu|10 years ago|reply

I tried to run the same image in it a few times. http://imgur.com/5Hsuoiy

Since you have this all set up, can you make some feedback loop animations for example with zooming? Or apply this to each frame of a movie? For example something famous like Charlie Bit My Finger. Hopefully using the deeper more horrifying setting.

[+] colah3|10 years ago|reply

Nice! You might consider adding support for one of the MIT Places networks (http://places.csail.mit.edu/downloadCNN.html). That's how we got a lot of the pictures we used in the original blog post. For example, these were made that way: http://1.bp.blogspot.com/-XZ0i0zXOhQk/VYIXdyIL9kI/AAAAAAAAAm...

[+] codezero|10 years ago|reply

awesome! I was just going to suggest that someone do this!

Check out the tiger, and super weird tiger: https://twitter.com/radiofreejohn/status/616490624095621120 :) :)

[+] mikedmiked|10 years ago|reply

Looks great, thanks for this.

Some of us are going to be putting our own on http://reddit.com/r/deepdream

[+] analogmind|10 years ago|reply

Maybe add some fields to manipulate the other parameters :-)

[+] IgorPartola|10 years ago|reply

Looks like you got slashdotted. Did you happen to create any kind of packaged installation? I'd like to try it over the weekend.

[+] wormley|10 years ago|reply

did the site crash or something? I'm not getting anything back

[+] anantzoid|10 years ago|reply

Nooo:( Was just starting to do that!

[+] Liquix|10 years ago|reply

The visuals generated by the neural network remind me of visuals experienced under the influence of psilocybin or LSD. I wonder if I am making an unjust leap or if there is a similar organic process (searching for familiar patterns) taking place in the mind? Fascinating, thanks for sharing.

[+] sabalaba|10 years ago|reply

No hypothesis is unjust! It could also be related some of the experiences people have in sensory deprivation tanks. Your brain attempting to see structure in noise and hallucinates. One hypothesis would be that on LSD, and other psychoactive substances, this feedback loop is somehow enhanced. There might be a few doctorates to be earned in testing these hypotheses.

[+] tim333|10 years ago|reply

It would make sense if the brain did use similar mechanisms to search for patterns.

[+] unknown|10 years ago|reply

[deleted]

[+] hellbanner|10 years ago|reply

"Be careful running the code above, it can bring you into very strange realms!"

Reminds me of Charlie Stross's new novel,

"A brief recap: magic is the name given to the practice of manipulating the ultrastructure of reality by carrying out mathematical operations. We live in a multiverse, and certain operators trigger echoes in the Platonic realm of mathematical truth, echoes which can be amplified and fed back into our (and other) realities. Computers, being machines for executing mathematical operations at very high speed, are useful to us as occult engines. Likewise, some of us have the ability to carry out magical operations in our own heads, albeit at terrible cost."

http://www.tor.com/2015/06/30/excerpt-the-annihilation-score...

[+] thirdtruck|10 years ago|reply

You might also like Shadowfist (http://shadowfist.com), a card game that used to have the Purists, a playable faction powered by esoteric, math-centric magic.

[+] kordless|10 years ago|reply

Stross definately has the sight. As for the Platonic realm, well, that's just the hypervisor he's referring to. :)

[+] malkia|10 years ago|reply

Here are some images I've done - https://www.facebook.com/media/set/?set=a.720197931442169.10...

[+] MrBuddyCasino|10 years ago|reply

Those Teletubbies are perfect. Best I've seen yet.

[+] rsp1984|10 years ago|reply

Wow, these are way better than the Google Inceptionism originals.

[+] saintcorp|10 years ago|reply

Those are great! Which model did you use for the Mona Lisa picture? Thanks.

[+] malkia|10 years ago|reply

Time to #DeepDream some minecraft texture packs :)

[+] cing|10 years ago|reply

Great, I got the dependencies installed on OSX and I'm already monsterifying a head shot for LinkedIn. Now, to find a way to get this working in real time with a webcam...

[+] sciencerobot|10 years ago|reply

I'm stuck at compiling Caffe :\

[+] benanne|10 years ago|reply

We sort of reverse-engineered this last week and set up a stream with live interactive "hallucinations": http://www.twitch.tv/317070

You can suggest what objects the network should dream about (combinations of two are also possible).

Our code will be published on GitHub later today!

[+] majora2007|10 years ago|reply

I'm very excited to see the code. I read your blog post earlier this week and am very intrigued.

[+] pierrec|10 years ago|reply

Amazing that it easily runs on consumer hardware, this dispels suspicions that a Google cluster was necessary for these results.

I'm wondering if it's possible to use this with a model that was trained on a database without labels, just pictures. Is such a thing even possible? For this particular application, labeling and categories are ultimately superfluous, but are they required in order to get there?

[+] nicklovescode|10 years ago|reply

Can someone please create a SASS interface to play with it? Would love to send this to family/friends who can't easily spin up the code.

[+] zan2434|10 years ago|reply

http://deepdreams.zainshah.net/?q=1

[+] spot|10 years ago|reply

A simpler version of this idea (making an image A out of matching pieces of a set of images B) was implemented in the early 90s and released as open source: http://draves.org/fuse/

[+] rsp1984|10 years ago|reply

I always wonder why sometimes the system finds faces and other elements in essentially untextured / homogeneous parts of images. Wouldn't there be some sort of "data term" in the energy functional that would suppress these results and/or move them to other parts of the image?

Perhaps this is working entirely differently and I'm thinking too much in the classical computer vision realm. Would love some explanation though.

[+] wodenokoto|10 years ago|reply

I imagine the chance of an input that would result in zero confidence in all output nodes is damn near zero.

There will basically always be an output nose with the highest confidence, no matter how low.

[+] rayalez|10 years ago|reply

This is really cool. I wonder what it would look like applied to video.

Also I didn't know that github displays .ipynb, that's pretty awesome.

[+] jrabek|10 years ago|reply

This should be combined with the oculus with a camera on the front.

[+] johnwatson11218|10 years ago|reply

Does anyone know if this technique can be used to slurp up a database and produce "typical" records for populating a test database? This is a problem that I struggled with a few years ago and still haven't found a good automated solution.

[+] sova|10 years ago|reply

Could you refine your question? This is a post about image processing via neural network. Do you mean take an existing database, learn via neural network, and populate a fresh one with "learned" attributes?

[+] taliesinb|10 years ago|reply

The dogs, eyes, and Dali-like bird-dogs are really cool. I've seen some insects, too, but not very often.

Are there any other flavors of hallucination? Why all the dogs? I suppose ImageNet has a lot of dog varieties in its category list.

[+] unknown|10 years ago|reply

[deleted]

[+] malkia|10 years ago|reply

A Trip To The Moon - http://imgur.com/a/EkAkv

[+] sova|10 years ago|reply

So awesomely trippy, love it.

[+] llSourcell|10 years ago|reply

ugh so annoying to compile can someone make this easier

[+] grrowl|10 years ago|reply

http://ryankennedy.io/running-the-deep-dream/ installs a Docker container, couldn't be easier.

[+] Nordavind|10 years ago|reply

But... How do you do this?

[+] armab|10 years ago|reply

Such things is the reason why I like scientific-friendly Python community.

55 comments