top | item 29901566

(no title)

liuru | 4 years ago

Hey HN, one of the team members here!

I hope you all enjoy playing with the new and improved generator! We've been hard at work improving the model quality since the last time the site was posted[1]

As both a professional fantasy illustrator & software engineer, I find the concept of AI creativity so fascinating. On one hand, I know that mathematically AI only can hallucinate images that fit within the distribution of things that it's seen. But from the artist perspective, the model's ability to blend two existing styles into something so distinctly new is so incredible (and not to mention also commercially useful!)

Anyways, happy to answer any question, thoughts, or concerns!

---

[1] https://news.ycombinator.com/item?id=20511459]

discuss

wodenokoto|4 years ago

Naïvely I thought Waifu generator was just “some guy having a laugh” fine-tuning a model off of hugging face, but reading through the comments here, it is obviously a much, much bigger enterprise.

Can you talk a little about team size, work process, funding and revenue stream? I think the effort required for such an undertaking is vastly underestimated by readers.

Cixelyn|4 years ago

Right now it's a small team of 6 people, and we have a bit of funding + compute credits to train models. There's a bit of revenue from some past projects and AI-consulting, but we're mostly betting big on our new AI-powerd mobile title Arrowmancer[1].

> I think the effort required for such an undertaking is vastly underestimated by readers.

Haha for sure. Hosting a real-time ML model for people to do sub 1-second inferences at HN-load scale is definitely nontrivial.

[1] https://arrowmancer.com

hansel_der|4 years ago

> Naïvely I thought Waifu generator was just “some guy having a laugh”

same here. what's naive about it?

not to badmouth the undertaking, but wtf is this doing on HN?

2bitencryption|4 years ago

Firstly, amazing work.

My question is, how do you figure out how to parameterize "Same character, different pose" / "Same character, different eyes" / "Same character, different gender" / etc?

My (super limited) understanding of GANs is that they slowly discover these features over time simply from observation in the data set, and not from any labels.

So how could you make e.x. a slider for head position, style, pose, etc? How do you look at the resulting model and figure out "these are the inputs we have to fiddle with to make it use a certain pose"?

You mention it a bit in this section, but I didn't fully understand: "By isolating the vectors that control certain features, we can create results like different pose, same character"

And I assume the same step needs to be done every time the model is retrained or fine-tuned, because possibly the vectors have shifted within the model since they are not fixed by design?

liuru|4 years ago

Yes, your understanding is correct!

You can think of it like coordinates on a many-dimensional vector grid.

We craft the functions the functions that will illuminate sets of those points based on a combination of observation, what we know about our model architecture, and how our data is arranged.

And yes, when the model is retrained, we have to discover them again!

flor1s|4 years ago

Not affiliated with this project, but there is a gazillion different variations of GANs. Most just change the adversarial loss to improve the learning rate / quality, but others focus on architectural changes, such as StarGAN, Pix2pix (conditional GAN), CycleGAN, MUNIT, etc. It's really a fascinating field.

thyrox|4 years ago

Roughly speaking how much money did you invest into making this? Just curious if this is something an indie hacker can hope to do one day OR do you need some deep pockets to make a site like this?

ridaj|4 years ago

Fascinating... Thanks for sharing

A couple questions:

1) I didn't really understand how you went about identifying what vectors of the latent space stand for various things, like pose or color. Did you train one of the AIs to that effect, or did you manually inspect a bunch of vectors, twiddling through them one by one, did to the outcome?

2) If one were to train an AI to the same level using commodity cloud services, what's the order of magnitude cost that you would pay for the training? More like $100, $1,000, $10,000 or $100,000?

liuru|4 years ago

1) It was mostly manual, though AIs were useful in certain filtering tasks.

2) Depends on the quality you are seeking. If you only want one run of a similar, off-the-shelf model, around the 1000s is enough. But at the number of iterations you have to run to build your own and improve results, you probably need about 100k.

To tackle this problem, we built our own supercomputer off of parts we bought off of ebay, though I can't say I recommend that route, because it now lives in our living room.

dimgl|4 years ago

You mention it took two weeks to get to the point that we see in the article.

Does this mean two weeks of development, or two weeks to generate the images we're seeing? Or maybe did you train the model for two weeks? That point just wasn't exactly clear for me.

liuru|4 years ago

2 weeks to train the model!

Development took on-and-off roughly 2 years to achieve the quality you see today.

kouteiheika|4 years ago

What are the terms of use for the images generated through your website? I'm guessing any commercial use is forbidden? It would be nice if you could formally spell it out on the website.

JetAlone|4 years ago

I don't think there's any powerful enough way to stop people from generating one and then tracing over it to create their own linework, and customize things like the colouring and shading. The more broadly AI is able to create, the more niche and obfuscated directions human co-creators could take its products in.

kregasaurusrex|4 years ago

I purchased a waifu from your vending machine (loved the blog post!) at Gen Con in 2019, but can't see the saved model in my account. Is there a way for me to get a v2 generation?

liuru|4 years ago

Welcome back!

We're currently working on the data migration from V1! As long as you are using the same email as you did in 2019, you'll be able to see the image again!

As for a V2 generation, sorry, because the models are different, you'll have to discover a similar image again, if you want a V2 version!

rackjack|4 years ago

I LOVE that "horror". Reminds me of some of the art I've seen on album/single covers. Any chance of letting people access that kind of intermediate step? (Though I know it's a niche as hell use case).

liuru|4 years ago

Ah yes, the fine line between charming anime character and lovecraftian horror

There was such popular demand for these "horror" images that we made them part of the generation in V2! If you refresh enough on the webpage, you can find some horrors!

Cthulhu_|4 years ago

With the game you're building, are the character portraits generated once and that's it, or do you plan on making them dynamic or frequently updated?

I've seen a number of mobile games that just get flooded with characters; this tool looks like it could be used to automate that process. It could be combined with AI-generated character profiles as well, creating an 'infinite' character roster in video games.

Terry_Roll|4 years ago

I wonder what an AI trained to spot deepfake Waifu's will detect.

In humans, things like the pupil can be the give away.

https://www.newscientist.com/article/2289815-ai-can-detect-a...

yccs27|4 years ago

This is a super interesting question, given that the generator model is trained to fool the discriminator, which is also an AI.

oneoff786|4 years ago

Highlight the pixels with high sharp values. Should be doable.

hypertele-Xii|4 years ago

Why do stuff like this never come down from the web? I'd pay for a program I could download and use with my own image files.

Gigachad|4 years ago

They tend to require specific hardware like a NVIDIA GPU. As well as having an ever evolving large model file which they will want to frequently update. Some tools certainly have had offline versions but I guess not many people are interested in setting it all up and are happy with an instant web ui

liuru|4 years ago

While our model is not public, there are good resources online for playing with your own images!

Like this one by fast.ai!

https://docs.fast.ai/vision.gan.html

Afforess|4 years ago

Same reason the Coca-Cola recipe is not published nor made freely available by the Coca-Cola corporation.

zozbot234|4 years ago

You just need to code up your own model architecture and then train it on your data using some established ML framework. The first step is where well-chosen priors can make a real difference wrt. your end results.

simonebrunozzi|4 years ago

So neat! Where are you based? Boston, I assume?

Is there an email to reach out to you or someone in the team? ($HNusername @ gmail)

Cixelyn|4 years ago

San Francisco! Just sent over a ping!

GoblinSlayer|4 years ago

Would you try to create a new style? Train the discriminator on the score tag of danbooru dataset, then use it to rate the generator's style, this way it should be able to create a new style.

searchableguy|4 years ago

Do you plan to provide an API to generate waifu?

I think I could use this for a project.

liuru|4 years ago

In the future, perhaps! This is a popular request, so we are thinking about ways we can do this.

YeGoblynQueenne|4 years ago

Hello and thank you for answering questions. The following is a quote from your article:

>> It is interesting to note that from this process, the AI is not merely learning to copy the works it has seen, but forming high-level (shapes) and low-level (texture) features for constructing original pictures in its own mental representation.

Can you explain what you mean by "mental" representation? Does your system have a mind?

Also, why are you calling it "an AI"? Is it because you think it is an artificial intelligence, say like the robots in science fiction movies? Is it capable of anything else than generating images?

xg15|4 years ago

Not OP, but I wonder if the process would be in some way comparable to rigging a 3D model. There is well, you usually have some high-level input parameters, which influence joints on a predefined skeleton, which in turn determines the position of individual vertices in the 3D body. Finally, the 3D shape is used to render the actual pixels.

On each step, high-level parameters are combined with predefined weights to produce a more low-level output.

Seems, a similar transformation is going on here, except that the weights and the structure are somehow learned on its own.

jacoblambda|4 years ago

Something I was wondering but couldn't find on the site: What is the license for the generated works through the project?

tedmcory77|4 years ago

Who would someone speak with about licensing things made using waifu? My email contact is in my profile...

darkengine|4 years ago

Is the code or any of the models available to the public? I'd love to mess with this on a local GPU cluster.

liuru|4 years ago

Not at the moment! A similar project that I really admire is public, though!

https://www.thiswaifudoesnotexist.net/

unobatbayar|4 years ago

The quality and style is mindblowing! What data did you train?

liuru|4 years ago

The first iteration of our model was built off of this amazing public dataset:

https://www.gwern.net/Danbooru2020

Though now we have made our own :)

lynzrand|4 years ago

As a rough guess, I think it might be trained on the Danbooru archive dataset, since it's the largest anime picture dataset we can get today.

https://www.gwern.net/Danbooru2020

Bombthecat|4 years ago

The more interesting question: What about the sources for the AI to train? How are those artists paid? Do we need to pay them? Or if it used by an AI as train data, we just say: Its like a human learning?