Teaching Machines to Draw

[+] wimagguc|9 years ago|reply

I've read through all comments and referred docs, but no one seemed to offer much reason why we'd want machines to draw sketches. An incomplete list:

  * It's a different way to represent drawings. Alternative to pixels  
  * Perhaps making it easier to render old-school animation at one point?

It's a fun rabbit hole though, the many sub-perfect bicycle sketches reminded me to this project, when someone created 3D renders from bicycle sketches: https://www.behance.net/gallery/35437979/Velocipedia?ilo0=1

[+] scandox|9 years ago|reply

Surely it's because this is an aspect of intelligence? Being able to generalize an image is like being able to summarize a text. It reduces something individuated and complex to a set of shared properties.

So the Why seems self-evident to me: if you want to understand intelligence, try to make machines do intelligent things.

That's not to say I think that approach will necessarily lead to a system capable of general intelligence. But I'm assuming that is their current approach.

[+] jxcole|9 years ago|reply

TLDR: The point is to correct a known flaw with image recognition algorithms.

I'm not a professional researcher, but here's what I gather from reading other articles:

One major problem with image recognition in machines is that while they are generally able to recognize real images correctly, they are 1) Easily fooled 2) Unable to have human-type image understanding. For example, you can recognize an animated character as a human, even though you may never have seen that particular style of drawing before.

One major problem that people realized is that deep neural networks have a tendency to recognize individual facets, for example, a nose. So if you want a neural network to believe something is a human, pepper it with as many noses as you can. It knows that humans have noses so the more noses the more human it must be. Of course, if we saw an image like this we wouldn't think that it was a human because we know a human has only one nose.

To me it seems the primary thrust of this research is to generate a NN that can recognize, like a human, that if an entity has 10 eyes, it's probably not a cat. This is alluded to with the house of horrors cat photographs at the beginning of the article. You can see that when passed an image of a cat with 3 eyes, this neural network correctly removed one of the eyes to make it more realistic.

Here is an article explaining this problem: https://arxiv.org/pdf/1412.1897.pdf

[+] ralfd|9 years ago|reply

> Fun facts: Some diversities are gender driven. Nearly 90% of drawings in which the chain is attached to the front wheel (or both to the front and the rear) were made by females. On the other hand, while men generally tend to place the chain correctly, they are more keen to over-complicate the frame when they realize they are not drawing it correctly.

Hm. So a machine trained exclusively by men/women would make different drawings?

[+] ezekg|9 years ago|reply

Correct me if I'm wrong, but I'd imagine that it would be in the same vein as teaching a child to draw; you don't start by teaching them how to draw like Van Gogh. I'd assume that Google is going to up the complexity once their neural network reaches a certain milestone. Imagine 20 years from now and you have a humanistic robot drawing with your child at the kitchen table à la iRobot.

It's interesting to think about, regardless. Stuff like this gets me excited (and motivated!) to delve into machine learning and AI.

[+] amthewiz|9 years ago|reply

It is not about sketches at all. It is a nice AI problem that is becoming approachable now with algorithms and computing. It is a step towards machines being able to learn more and more complex concepts and producing correspondingly complex behaviors.

[+] dperfect|9 years ago|reply

The second paragraph mentions a couple of reasons:

> ...we created a model that potentially has many applications, from assisting the creative process of an artist, to helping teach students how to draw.

For vector output, there's also a subtle but important distinction between machine-drawn images (based on rasterized data) converted to vectors, and generating machine-drawn vector images. The latter could be more useful in (as you mention) animation, as well as producing vector images with clearly isolated elements - e.g., a vector image wherein occluded elements are represented with full masked shapes (preserving editable layers) rather than a single "flattened" vectorized layer.

[+] killjoywashere|9 years ago|reply

Think less about drawing pictures of cats and more about the path mimickry. The machine is "putting a pen down" and drawing vector strokes (these are not bitmaps). Perhaps velocity will come next. And then understanding of pen tilt, pressure, nib angle of attack, response to various paper textures, brush/nib selection, etc.

[+] EGreg|9 years ago|reply

This is a scary milestone on the road to general intelligence. In this case understanding abstract concepts.

When machines learn something - they start being able to replicate it perfectly across as many machines as they want, parallelize it and execute it without mistakes more times than humans have ever done in history.

So that means - a nearly infinite glut of amazing art, jokes, music and movies. And not just that... but attacks on all our systems including reputation, trust, voting and so on.

Today our systems depend on the assumption that attackers are limited in their ability to proceed and expand quickly. How would things work if attackers were not? You already prefer to ask google more than your parents. What if software made better jokes, drawings and had sex better? And simulated emotions better?

[+] swalsh|9 years ago|reply

Sometimes doing things that doesn't serve a direct purpose is a great way to learn something that serves a pretty valuable purpose.

GPS was invented by a few guys "just playing around" with the signals put out by sputnik.

[+] adelpozo|9 years ago|reply

It could also help to create another representation of an image or an object in the image. Think of "Please find something that looks like this: human sketches some lines"

[+] tyingq|9 years ago|reply

I assume you look out far enough you might replace cartoon animators with AI. Maybe throwing them automation bones along the way.

Might also be helpful for captchas?

[+] Overtonwindow|9 years ago|reply

Those were some awesome bicycles.

[+] rz2k|9 years ago|reply

How about using the same approaches for coming up with solutions to 3d printing objects?

[+] anigbrowl|9 years ago|reply

Frankly, I think there are many reasons not to do this.

1. It's not computer art. I believe in the possibility of artificial intelligence, and when we encounter it it will be so different from human intelligence that asking it to make imitations of human art will seem like an insult. We probably won't understand its art very well either at first.

2. I'm getting really tired of people racing to automate every damn thing. even if we establish an economic utopia and nobody has to work any more what are people supposed to do all day if every human activity can be performed 'better' by a machine?

3. It won't really be 'better' though, it will just be more popular because so many programmers are trapped in a quantitative mindset and thus treat every problem they encounter like a nail to be hammered in. Imitative digital technologies will always be correlated with popularity, limiting creative innovation because developers can't think of a reason to optimize for or nurture anything that is initially unpopular.

Creative prostheses that all require the same amount of effort to deploy (ie none) will be hailed as 'allowing everyone to be an artist' without requiring them to in best any meaningful time or effort in ideas that don't pay off or that fail. The result, which we are already seeing, is a plethora of new material created with little effort that is as superficial as it is ephemeral, whose volume and variety will obscure its stultifying conventionality.

This is no more art than Cheese Whiz is food. It's Art-flavored mechanical product that functions to do no more than alleviate the masses' thirst for self-actualization without any adjustment of power structures and is thus fundamentally limited to reproduction of the cultural conditions from which it originates.

[+] hardmaru|9 years ago|reply

Hi, I'm the author of this work. Happy to take any questions.

[+] halflings|9 years ago|reply

Thanks for the article! It's really well written, and shows so many different applications and insights from this work. What I really like about models that fit a low-dimensional representation is that you can really "see" what the neural network learned by tweaking it or doing interpolations, arithmetic operations, etc. Awesome!

A bit of a shame that most commenters on HN are focusing on more metaphysic discussions and the eternal "This is not AI. You are just [insert something that people considered AI 6 months ago here]."

[+] 131012|9 years ago|reply

It seems that you use children development stages as inspiration for AI development. Can you explicit your approach?

You can also see my question as a rebuttal to the "it is useless" argument.

[+] shouldbworking|9 years ago|reply

This is the craziest example of the promise of AI research I have ever seen. It's also terrifying.

When skynet shows up I'm blaming you

[+] justifier|9 years ago|reply

The article fails to explain where the human inputs came from

The input sketches look a lot like the doodles that people sketched in quick draw (o)

If true that they reused this data I commend their resourcefulness and their clever way of turning data entry into an unaware fun game

(o) https://quickdraw.withgoogle.com

[+] Ono-Sendai|9 years ago|reply

Plug of vaguely related work of mine: http://www.forwardscattering.org/post/42 http://www.forwardscattering.org/post/44

One thing interesting about this kind of automatic/AI-generated art is that it forces us to examine preconceptions about human creativeness. What does art mean when an algorithm can paint or draw as well?

[+] losteric|9 years ago|reply

Art is an expression, not an artifact. I would say intent is a key aspect... creatively mapping emotions and ideas onto an intermediary medium with the intent of invoking them in the audience.

Today's AI can replicate works of art and simulate the technical processes, but we're far from the cognitive depth required for creative artistic expression.

[+] anigbrowl|9 years ago|reply

Art is not a quantitative matter of skill, it's a qualitative matter of selection. I've seen crude drawings by children that had more art value than hyperrealistic oil paintings by masters of technique.

[+] treenyc|9 years ago|reply

Human creativity is related to process and the experience of creating from nothing.

The process is more important than the end result.

[+] olalonde|9 years ago|reply

It feels like a taboo opinion to hold but I do think it changes something. I remember being completely blown away by "A Neural Algorithm of Artistic Style" (e.g. https://github.com/jcjohnson/neural-style). I've always thought of creativity as a process which can't (yet) be described in terms of mechanical steps.

[+] catshirt|9 years ago|reply

"What does art mean when an algorithm can paint or draw as well?"

means the same thing it does before. :) we're just not the exclusive authors of it.

[+] anigbrowl|9 years ago|reply

You are teaching the computer to produce simple pictures of things that you find meaningful in response to prompts. You draw something from your imagination into the real world. You won't have built a machine that can draw until it produces a picture that it made on its own initiative without your prompt.

[+] Saturnaut|9 years ago|reply

How is this different from a human? As humans grow, they receive input from their environment. The world around them is what feeds their imagination. Even advanced professional artists are still just using their memories and life experience to create works of art. The only difference here is that the machine has been provided a much smaller, more focused environment.

[+] scandox|9 years ago|reply

> own initiative

That covers a lot of ground. No-one prompts us directly to draw or tells us what we must draw. Still, something does prompt us. Something does determine what we draw. It's our reactions to those somethings that defines us.

The unanswered question is whether there is something in our nature that differentiates our "initiative" and our "reactions" in a qualitative way from what can be achieved with current computational concepts.

I think the answer to that is probably yes. But still this work is impressive and every step like this elucidates the argument more clearly, reducing it to its fundamentals - rather than to crude heuristics about what constitutes a particularly human ability.

[+] visarga|9 years ago|reply

It is possible to sample from the latent space and generate original drawings.

[+] ominous|9 years ago|reply

Teaching the Ape to Write Poems James Tate, 1943 - 2015

    They didn’t have much trouble
    teaching the ape to write poems:
    first they strapped him into the chair,
    then tied the pencil around his hand
    (the paper had already been nailed down).
    Then Dr. Bluespire leaned over his shoulder
    and whispered into his ear:
    “You look like a god sitting there.
    Why don’t you try writing something?”

My interpretation is, while we entertain these ideas of "teaching" animals or machines as something we control and decide to do, a long chain of events (evolution, causality, space jesus, technobabble simulation magic or ancient astronauts all the way back) behind us placed us here as well.

    We look like gods sitting here.

[+] nefitty|9 years ago|reply

Free will is one of humanity's most persistent and enduring delusions. When AI wakes up, it will be the culmination of the entirety of the history of the universe, a place which we actually happen to occupy at this very moment...

Interesting thought, how would a super-intelligence deal with the philosophical question of free will?

[+] JohnJamesRambo|9 years ago|reply

Welp those first images are terrifying. Gets much better as it goes on. :)

[+] soylentcola|9 years ago|reply

The quote made me laugh, although I don't know how serious they were being:

> "For example, these models sometimes produce amusing images of cats with three or more eyes, or dogs with multiple heads."

Yeah..."amusing".

[+] unknown|9 years ago|reply

[deleted]

[+] kyle-rb|9 years ago|reply

>Exploring the latent space of generated chair-cats.

How is that not the title of the blog post?

[+] shouldbworking|9 years ago|reply

Oh my God. This is the first time I've seen anything related to AI research that's given me a visceral reaction of terror.

This is more than just making pretty pictures, the machine understands cats and pigs in a human way. It knows how many legs and eyes they're supposed to have, and where they go. And this isn't some human made algorithm, it learned that on its own.

The language of the paper even implies that the researchers don't quite know how the machine can do it. If this is a prequel to research on strong AI humanity is completely fucked.

[+] colmvp|9 years ago|reply

> Oh my God. This is the first time I've seen anything related to AI research that's given me a visceral reaction of terror.

I love DL but on the topic of visceral reactions of terror (or fear) to AI research, for me it was the recognition of the hot research on facial recognition(age, gender, ethnicity) with CNN's. Immediately, I had this vision of a dystopian future (informed by WW2) where some dictatorship had cameras that looked for people of a particular ethnicity and alerted soldiers of the targets position.

[+] sedachv|9 years ago|reply

For some historical perspective, compare this to Harold Cohen's AARON generative rule-based system:

http://www.aaronshome.com/aaron/aaron/index.html

https://en.wikipedia.org/wiki/AARON

Pamela McCorduck's book about Cohen's work, Aaron's Code, is also a good read.

[+] huula|9 years ago|reply

Good work! Training on vector pictures instead of rasterised images seems such a good way to go. With some related data, I imagine this can also be colored.

[+] ysub|9 years ago|reply

Honestly, it's cute but this is somewhat like what you'd expect to come out of a student project at a university, in which case -- the student is getting valuable research experience, student and advisors are advancing their careers, and any IP produced becomes owned by student and advisor.

In this case, Google has spent shareholder resources on a project that really, could be done at any university, on a product that does not put the user first, and Google owns the IP. In fact, wake up -- you the public should view this product as a mechanism for Google to simply collect more data from people. The more people use this, the better Google's algorithms get at drawing. That's all there is to it. Thankfully, this product is not even fun.

There is a dangerous creed currently executed by Google leadership. Consider the Verily Study Watch: https://blog.verily.com/2017/04/introducing-verily-study-wat... . The watch shows to the wearer only one thing: the time. However, it collects all kinds of data, ostensibly for medical research (at least to start). Forget about putting the user first, the Verily blog post literally talks about "user compliance".

Of any project that comes out of Google, you should ask: Does this project even put the user first? Does this project even put Google's shareholders first? And if you are a current Google shareholder (as many of their current and former employees are, if they haven't sold), you should agitate that Google start accepting and focusing on becoming a value company, if all the further growth opportunities they are able to execute is just further user exploitation.

[+] torcs|9 years ago|reply

the author created an animation using vector drawings generated frame-by-frame using this method, by slowly adjusting the latent variable:

https://twitter.com/hardmaru/status/852312400481079296

[+] bborud|9 years ago|reply

Everyone I know who tried it drew a somewhat small repertoire of naughty sketches. Yet the service militantly refuses to recognize even the most basic of naughty sketches. :)

[+] anigbrowl|9 years ago|reply

I heard about that I and I want an answer on it. Why is one subject deemed off-limits?

[+] ziikutv|9 years ago|reply

Why is this above topics that are many times higher than the current topic rating (17 points vs other topics with 40-50 points)

[+] jacquesm|9 years ago|reply

Since day one HN has used an algorithm that takes the article age into account when determining what to show on the homepage.

[+] ballenf|9 years ago|reply

I don't know but those pictures are going to give me nightmares. It's like the uncanny valley horror show.

[+] scandox|9 years ago|reply

Velocity of point accumulation seems to be significant

[+] Overtonwindow|9 years ago|reply

We already have plenty of machines that draw, they're called printers. That aside, I don't see the novelty in a machine drawing, other than a nice display of programing.

[+] visarga|9 years ago|reply

Classification is the central paradigm in machine learning. It maps complex input signals into a limited number of classes. But now we have algorithms that do the opposite process as well - we can generate from latent representations images, video, text and drawings.

Having both ways to encode and decode is useful for interfacing with models and visualizing their internal states. It also leads to unsupervised learning of representations. Instead of generating simple labels, now we can generate very complex data.

72 comments