top | item 37792331

(no title)

_Nat_ | 2 years ago

> Wouldn’t something like isMale*P(male=.66) work fine?

It doesn't think like that.

If it did, they could've just done `P(hasFiveFingersPerHand)=0.99999`.

But it doesn't even necessarily draw what you ask it to. Instead, it generally adopts a set of de-noising transforms that it's been trained to believe would tend to lead to what the prompt sounds like.. then whatever those transforms produce would, hopefully, be sorta like what was requested.

discuss

order

Der_Einzige|2 years ago

Custom loss functions absolutely work and work basically the way described above.

https://colab.research.google.com/drive/1dlgggNa5Mz8sEAGU0wF...

You can see them define a custom color loss and apply it simultaneously with the regular diffusion loss. I've actually expanded this notebook to allow regional specification of the custom loss.

It's quite difficult to define a function that detects if an individual has 5 fingers or not. That's the real issue.

_Nat_|2 years ago

The comment I'd responded to seemed to have thought that StableDiffusion picked what the sex of a person would be according to some internal odds that could be modified.

My point was that it doesn't actually think like that. For example, prompting StableDiffusion for a picture of a doctor doesn't necessarily get it to draw a human at all, much less a doctor of a pre-determined sex; instead, StableDiffusion de-noises the image until the result emerges, where that result would (ideally) contain a doctor of whatever sex it happened to come up with.

That said, you're right that we can add more code to try to guide things.

We could even just brute-force it by just re-generating images over-and-over, or tweaking them after generation, until they match exactly what we wanted. (Realistically, something like branch-and-bound would probably be preferred to blindly guess-and-check-ing.)