Inceptionism: Going Deeper into Neural Networks

[+] davedx|10 years ago|reply

Worth reading the comments too.

One from Vincent Vanhoucke: "This is the most fun we've had in the office in a while. We've even made some of those 'Inceptionistic' art pieces into giant posters. Beyond the eye candy, there is actually something deeply interesting in this line of work: neural networks have a bad reputation for being strange black boxes that that are opaque to inspection. I have never understood those charges: any other model (GMM, SVM, Random Forests) of any sufficient complexity for a real task is completely opaque for very fundamental reasons: their non-linear structure makes it hard to project back the function they represent into their input space and make sense of it. Not so with backprop, as this blog post shows eloquently: you can query the model and ask what it believes it is seeing or 'wants' to see simply by following gradients. This 'guided hallucination' technique is very powerful and the gorgeous visualizations it generates are very evocative of what's really going on in the network."

[+] svantana|10 years ago|reply

That's not really fair though, since any deterministic function can be "back-propagated" using the chain rule (or even automatic differentiation), even though it's not really necessary for simpler models such as GMM and SVM since there are much easier ways of inspecting them. Also, I don't feel single input/output pairs really describe the function itself -- knowing cos(0) = 1 doesn't reveal much about the cosine function, even though it's a local maximum. Maybe one could extend the technique to show transitions (morphing) between classes as video?

[+] pault|10 years ago|reply

I can't really put my finger on it, but this is the most exciting thing that I've seen on HN in years.

[+] rndn|10 years ago|reply

Perhaps the argument should be steelmanned in that we should generally avoid using algorithms which are so complex that they aren't glass boxes. I doubt the idea to "simply follow gradients" can prove neural networks to be glass boxes because the output of that is still too complex. And we are clearly onto something here. If we can generate artificially hallucinated pictures today, it is not unreasonable to assume that computers will be able to hallucinate entire action sequences (including motor programs and all kinds of modalities) in a decade or two. Combining such a hallucination technique with reinforcement learning might be a key to general intelligence. I think it is highly unethical that there is almost no democratic control over what is being developed at Google, Facebook et al. in secrecy. The most recent XKCD comic is quite relevant: http://xkcd.com/1539/

[+] philipn|10 years ago|reply

The reason they look so 'fractal-like' (e.g. trippy!) is because they actually are fractals!

In the same way a normal fractal is a recursive application of some drawing function, this is a recursive application of different generation or "recognition -> generation" drawing functions built on top of the CNN.

So I believe that, given a random noise image, these networks don't generate the crazy trippy fractal patterns directly. Instead, that happens by feeding the generated image back to the network over and over again (with e.g. zooming in between).

Think of it a bit like a Rorschach test. But instead of ink blots, we'd use random noise and an artificial neural network. And instead of switching to the next Rorschach card after someone thinks they see a pattern, you continuously move the ink blot around until it looks more and more like the image the person thinks they see.

But because we're dealing with ink, and we're just randomly scattering it around, you'd start to see more and more of your original guess, or other recognized patterns, throughout the different parts of the scattered ink. Repeat this over and over again and you have these amazing fractals!

[+] chestervonwinch|10 years ago|reply

Functional iteration is actually a fun way to draw fractal images in the plane. It goes like this (for anyone interested):

1) Pick a function f: R^2 ==> R^2

2) Pick a region of R^2 (this could be the unit square for instance).

3) For each point in the region do the following:

  a) Plug the point into f. Then plug f(x) into f. Then plug f(f(x)) into f, etc....

  b) The norm of f(f(...f(x)...)) will either run off to infinite or stay bounded.

  c) Record for the original point, x, how many iterations it took the process to run off to infinite (or the maximum if the sequence stayed bounded).

4) Paint by number after assigning a unique color to each possible number of iterations.

Here's the result of this process for the function:

f(x,y) = ( exp(x) * cos(y), exp(x) * sin(y) )

http://i.imgur.com/LZKavio.png

[+] tripzilch|10 years ago|reply

> The reason they look so 'fractal-like' (e.g. trippy!) is because they actually are fractals!

While I agree with your idea about fractals (though you're a bit vague on the math details to know for sure), I also believe that a large reason the images look so "trippy" is because there is some local contrasting effect at work, generating high-saturation rainbow fringes at the edges of details and features. You get loads of that on psychedelics as well.

I bet there's a pretty straightforward reason to explain these rainbow fringes, if one were to dig into it, though.

Another (unrelated) observation I had was the feeling that the neural net seemed to be reproducing JPEG-artifact type fringes in the images? Though it could be that I was just looking at scaled versions of already JPEG-compressed output images, the article doesn't provide details (if only they had been PNGs ...).

[+] barbs|10 years ago|reply

That's really cool.

The trippiness is further compounded by the rainbow-ish colour effect produced by the recursive function, which mimics the "shimmering" rainbow effect you commonly get around lights when tripping on LSD.

And also, when under the influence of various drugs you tend to see patterns, particularly faces, where there aren't any.

[+] DanBC|10 years ago|reply

> The reason they look so 'fractal-like' (e.g. trippy!) is because they actually are fractals!

Do they exhibit self-similarity at different zoom levels?

[+] rndn|10 years ago|reply

These hair features that get re-interpreted as legs are particularly uncanny.

[+] meemoo|10 years ago|reply

Tweak image urls for bigger images:

Ibis: http://3.bp.blogspot.com/-4Uj3hPFupok/VYIT6s_c9OI/AAAAAAAAAl... Seurat: http://4.bp.blogspot.com/-PK_bEYY91cw/VYIVBYw63uI/AAAAAAAAAl... Clouds: http://4.bp.blogspot.com/-FPDgxlc-WPU/VYIV1bK50HI/AAAAAAAAAl... Buildings: http://1.bp.blogspot.com/-XZ0i0zXOhQk/VYIXdyIL9kI/AAAAAAAAAm...

I'd love to experiment with this and video. I predict a nerdy music video soon, and a pop video appropriation soon after.

[+] cing|10 years ago|reply

As linked in the last figure caption, there's a Google Photos gallery with high-resolution downloadable versions: https://goo.gl/photos/fFcivHZ2CDhqCkZdA

[+] agumonkey|10 years ago|reply

Had me sitting down. Felt mesmerizing, like a weird resonnance with my mind. This is how I imagined my brain working, patching bits of stimulus to recreate complex shapes fractally... Seeing it in pictures is ... just amazing.

[+] moyix|10 years ago|reply

This appears to be the source of the mysterious image that showed up on Reddit's /r/machinelearning the other day too:

https://www.reddit.com/r/MachineLearning/comments/3a1ebc/ima...

[+] murbard2|10 years ago|reply

Two remarks

1) Captain obvious says: the "tripiness" of these images is hardly coincidental, these networks are inspired by the visual cortex.

2) They had to put a prior on the low level pixels to get some sort of image out. This is because the system is trained as a discriminative classifier, and it never needed to learn this structure, since it was always present in the training set. This also means that the algorithm is going to be ignoring all sort of structures which are relevant to generation, but not relevant for discrimination, like the precise count and positioning of body parts for instance.

This makes for some cool nightmarish animals, but fully generative training could yield even more impressive results.

[+] ghoul2|10 years ago|reply

This is brilliant! I did something similar when I was trying to learn about neural networks a long long time ago. The results were fascinating.

I was writing a neural network trainer - to recognize simple 2D images. This was on a 300MHz desktop PC(!) so the network had to be pretty small. Which implied that the input images were just compositions of simple geometric shapes - a circle within a rectangle, two circles intersecting, etc.

When I tried "recalling" the learnt image after every few X epochs of training, I noticed the neural network was "inventing" more complex curves to better fit the image. Initially, only random dots would show up. Then it would have invented straight lines and would try to compose the target image out of one and more straight lines.

What was absolute fun to watch was, at some point, it would stop trying to compose a circle with multiple lines and just invent the circle. And then proceed to deform the circle as needed.

During different runs, I could even see how it got stuck into various local minima. To compose a rectangle, mostly the net would create four lines - but having the lines terminate was obviously difficult. As an alternative, sometimes the net would instead try a circle, which it would gradually elongate, straighten out the circumference, slowly to look more and more like a rectangle.

I was only an undergrad then, and was mostly doing this for fun - I do believe I should have written it up then. I do not even have the code anymore.

But good to know googlers do the same kinda goofy stuff :-)

[+] pault|10 years ago|reply

I would love to see what would come out of a network trained to recognize pornographic images using this technique. :)

[+] gradys|10 years ago|reply

Does anyone have a good sense of what exactly they mean here:

>Instead of exactly prescribing which feature we want the network to amplify, we can also let the network make that decision. In this case we simply feed the network an arbitrary image or photo and let the network analyze the picture. We then pick a layer and ask the network to enhance whatever it detected. Each layer of the network deals with features at a different level of abstraction, so the complexity of features we generate depends on which layer we choose to enhance. For example, lower layers tend to produce strokes or simple ornament-like patterns, because those layers are sensitive to basic features such as edges and their orientations.

Specifically, what does "we then pick a layer and ask the network to enhance whatever it detected" mean?

I understand that different layers deal with features at different levels of abstraction and how that corresponds with the different kinds of hallucinations shown, but how does it actually work? You choose the output of one layer, but what does it mean to ask the network to enhance it?

[+] murbard2|10 years ago|reply

The detection layer will detect very faint random signals. For example, if you have a unit that's supposed to detect dogs, it might be very faintly activated if by random chance there is a doggish quality to some part of the image. What they do is pick up that faint, random, signal and amplify it.

They say: oh you think that cloud is a tiny bit dog-like? Ok, well then find me a small modification to the image that would make it a little more dog like, then a little more, and so on.

Think of it as semantic contrast enhancement

[+] discardorama|10 years ago|reply

My understanding: when you're doing standard gradient descent, you push the error down through the layers, modifying the weights at each layer. Now, in "normal" NN training you stop at the input layer; it makes no sense to tweak the error at the input layer, right?

But what if you did the following: flow the error down from the outputs to the layer you're interested in, but don't modify the weights of any of the layers above it; just modify the values of this layer in accordance with the error gradient.

Added later: I think we should wait till @akarpathy comes along and ELI5's it to us.

[+] ebetica|10 years ago|reply

My thought is that they basically run gradient descent on the image where the loss is the magnitude of one output plane in one of the layers of the neural network. Probably using gradient descent to push up the magnitude of one of the output plane layers or something like that.

[+] unknown|10 years ago|reply

[deleted]

[+] unknown|10 years ago|reply

[deleted]

[+] simonster|10 years ago|reply

The fractal nature of many of the "hallucinated" images is kind of fascinating. The parallels to psychedelic drug-induced hallucinations are striking.

[+] intjk|10 years ago|reply

I'll repeat what I posted on facebook because I thought it was clever: "Yes, but only if we tell them to dream about electric sheep."

So, tell the machine to think about bananas, and it will conjure up a mental image of bananas. Tell it to imagine a fish-dog and it'll do its best. What happens if/when we have enough storage to supply it a 24/7 video feed (aka eyes), give a robot some navigational logic (or strap it to someone's head), and give it the ability to ask questions, say, below some confidence interval (and us the ability to supply it answers)? What would this represent? What would come out on the other side? A fraction of a human being? Or perhaps just an artificial representation of "the human experience".

...what if we fed it books?

[+] andor|10 years ago|reply

Neural networks are a relatively simple mathematical model. They don't actually "think" or have a conscience. Neural networks are also regularly fed books, in order to model some properties of natural language.

Here's a good introduction: http://colah.github.io/posts/2014-07-NLP-RNNs-Representation...

[+] sp332|10 years ago|reply

It would have some kind of intelligence, at least able to recall information and form associations between things. But there's no reason to think that it would come out looking human. I mean you can show a dog lots and lots of images and it doesn't turn human.

[+] fizixer|10 years ago|reply

Some comments seem to be appreciating (or getting disgusted by) the aesthetics but I think the "inceptionism" part should not be ignored:

We're essentially peeking inside a very rudimentary form of consciousness: a consciousness that is very fragile, very dependent, very underdeveloped, and full of "genetic errors". Once you have a functioning deep learning neural network, you have the assembly language of consciousness. Then you start playing with it (as this paper did), you create a hello world program, you solve the factorial function recursively, and so on. Somewhere in that universe of possible programs, is hidden a program (or a set of programs) that will be able to perform the thinking process a lot more accurately.

[+] romaniv|10 years ago|reply

We're essentially peeking inside a very rudimentary form of consciousness

Blatant sensationalism. There is absolutely nothing here that would suggest consciousness. If you have a mask for matching images, you can reverse that mask and imprint it as an image. What we're seeing here is a more complicated version of the same process. Heck, look more closely. Some of those "building" images have obvious chunks of pedestrians embedded, probably because the algorithm was trained on tourist photos.

Is it interesting? Yes, from algorithmic point of view. Cool as hell. However, this has nothing to do with consciousness.

If anything, some of those images are just a more elaborate version of a kaleidoscope. It's not like they run a network and got a drawing. They were looking for a particular result, did post processing, did pre-processing and tweaked the intermediate steps (by running them multiple times until the image looked interesting). Finally, we as viewers do our share of pattern matching, similar to how we see patterns in Rorschach inkblots. And there are captions that frame what we see and "guide" us to recognizing the right objects.

[+] davesque|10 years ago|reply

This is one of the most astounding things I've ever seen. Some of these images look positively like art. And not just art, but good art.

[+] mortenjorck|10 years ago|reply

This may sound ridiculous, but I think this has the potential to be a development as foundation-shaking as Modernism itself. There has been plenty of algorithmically-derived art over the past 30 years, but generative pieces inevitably look like math – they are interesting curiosities, sometimes quite beautiful, but they don’t challenge the mind like any of the major movements of the past 150 years.

This is different because, while still just math, it’s modeled on the processes of human perception. And when successfully executed, it plays on human perception in ways that were formerly the exclusive domain of humans – Chagall, DiChirico, Picasso – gifted with some sort of insight into that perception.

Future iterations of this kind of processing, with even higher-order symbol management could get really weird, really fast.

[+] isp|10 years ago|reply

I'm blown away by this "guided hallucination" technique. It's not a big oversimplification to describe to the layperson as: enter images into neural network; receive as output artwork representing the essence of the images.

[+] calebm|10 years ago|reply

I felt the same. I think the main aspect about these images that makes me like them is how everything feels connected, which, is what the AI is trying to find: connections. Honestly, can anyone tell me where I could order large prints of some of these?

[+] unknown|10 years ago|reply

[deleted]

[+] return0|10 years ago|reply

> And not just art, but good art

Makes me wonder what passes as good art nowadays. But yeah some of the renderings were particularly aesthetic.

[+] anigbrowl|10 years ago|reply

These images are remarkably similar to chemically-enhanced mammalian neural processing in both form and content. I feel comfortable saying that this is the Real Deal and Google has made a scientifically and historically significant discovery here. I'm also getting an intense burst of nostalgia.

[+] joeyspn|10 years ago|reply

The level of resemblance with a psychotropics' trip is simply fascinating. It's definitely really close to how our brain reacts when is flooded with dopamine + serotonin.

I wonder if the engineers at Google can make the same experiment with audio... It'll be funny to listen the results.

[+] tripzilch|10 years ago|reply

Might be interesting, although I've always liked the visuals* of psychedelica a lot more than the audio effects (which in my experience, mostly tends to make sounds be perceived really "loud" and "close", rather than "trippy"--unless that's what you associate with "trippy" audio, of course). Dunno if my experience is typical, obviously.

* also the particular mind-altering effects, which are hard to describe

[+] fortyeight|10 years ago|reply

I'd be interested to see if the results end up looking something like DIPT which is the only known mainly audio hallucinogenics.

[+] guelo|10 years ago|reply

I'm starting to come around to sama's way of thinking on AI. This stuff is going to be scary powerful in 5-10 years. And it will continue to get more powerful at an exponential rate.

[+] johnconner|10 years ago|reply

You are not alone in your fears. Others have been ringing the alarm for some time. Nick Bostrom's Superintelligence is a good reference.

[+] gojomo|10 years ago|reply

Facial-recognition neural nets can also generate creepy spectral faces. For example:

https://www.youtube.com/watch?v=XNZIN7Jh3Sg

https://www.youtube.com/watch?v=ogBPFG6qGLM

(Or if you want to put them full-screen on infinite loop in a darkened room: http://www.infinitelooper.com/?v=XNZIN7Jh3Sg&p=n http://www.infinitelooper.com/?v=ogBPFG6qGLM&p=n )

The code for the 1st is available in a Gist linked from its comments; the creator of the 2nd has a few other videos animating grid 'fantasies' of digit-recognition neural-nets.

[+] IanCal|10 years ago|reply

The one generated after looking at completely random noise on the bottom row, second from the right:

http://googleresearch.blogspot.co.uk/2015/06/inceptionism-go...

Reminds me very heavily of The Starry Night https://www.google.com/culturalinstitute/asset-viewer/the-st...

Lovely imagery.

I never had much luck with generative networks. I did some work putting RBMs on a GPU partly because I'd seen Hinton talk showing starting with a low level description and feeding it forwards, but always ended up with highly unstable networks myself.

[+] henryl|10 years ago|reply

I'll be the first to say it. It looks like an acid/shroom trip.

[+] jarboot|10 years ago|reply

Maybe there's something to do with how our brains interpret information differently when under the influence of psychoactive drugs.

I've been looking at Aldous Huxley's "Doors of Perception" and other psychonautic works recently and he hypothesizes that these sorts of drugs filter out the usual signals from the CNS that shut out the parts of perception that are not important for you to receive for survival.

It might be some great leap of armchair psychology, but I think we're due for another psychedelic revival, especially considering the new advances in synthetic psychedelics, legalization of more harmless recreational drugs, new tests in medical research using MDMA/LSD/Psilocybin, and the cultural shift away from the 'War on drugs'.

[+] frankosaurus|10 years ago|reply

Really cool. You could generate all kinds of interesting art with this.

I can't help but think of people who report seeing faces in their toast. Humans are biased towards seeing faces in randomness. A neural network trained on millions of puppy pictures will see dogs in clouds.

[+] fixermark|10 years ago|reply

That's essentially precisely what's happening here. You can see in the different pictures where different sets of training data were used---buildings, faces, animals.

Give the machine millions of reference images to work from and then tell it to find those images in noise, and it will succeed (because it literally can't "imagine" anything else for the noise to be).

[+] djfm|10 years ago|reply

Now I'm thinking about all those google cars, quietly resting in dark garages, dreaming about streets.

156 comments