top | item 36409140

(no title)

ewjt | 2 years ago

Can you elaborate on “properly tweaked”? When I use one of the Stable Diffusion and AUTOMATIC1111 templates on runpod.io, the results are absolutely worthless.

This is using some of the popular prompts you can find on sites like prompthero that show amazing examples.

It’s been serious expectation vs. reality disappointment for me and so I just pay the MidJourney or DALL-E fees.

discuss

order

kouteiheika|2 years ago

> Can you elaborate on “properly tweaked”?

In a nutshell:

1. Use a good checkpoint. Vanilla stable diffusion is relatively bad. There are plenty of good ones on civitai. Here's mine: https://civitai.com/models/94176

2. Use a good negative prompt with good textual inversions. (e.g. "ng_deepnegative_v1_75t", "verybadimagenegative_v1.3", etc.; you can download those from civitai too) Even if you have a good checkpoint this is essential to get good results.

3. Use a better sampling method instead of the default one. (e.g. I like to use "DPM++ SDE Karras")

There are more tricks to get even better output (e.g. controlnet is amazing), but these are the basics.

renewiltord|2 years ago

Thank you. I assume there's some community somewhere where people discuss this stuff. Do you know where that is? Or did you just learn this from disparate sources?

Lerc|2 years ago

What kind of(and how much) data did you use to train your checkpoint?

I'd like to have a go at making one myself targeted towards single objects (be it car,spaceship, dinner plate, apple, octopus, etc). Most checkpoints are very heavily leaning towards people and portraits.

orbital-decay|2 years ago

Are you using txt2img with the vanilla model? SD's actual value is in the large array of higher-order input methods and tooling; as a tradeoff, it requires more knowledge. Similarly to 3D CGI, it's a highly technical area. You don't just enter the prompt with it.

You can finetune it on your own material, or choose one of the hundreds of public finetuned models. You can guide it in a precise manner with a sketch or by extracting a pose from a photo using controlnets or any other method. You can influence the colors. You can explicitly separate prompt parts so the tokens don't leak into each other. You can use it as a photobashing tool with a plugin to popular image editing software. Things like ComfyUI enable extremely complicated pipelines as well. etc etc etc

nomand|2 years ago

Is there a coherent resource (not a scattered 'just google it' series of guides from all over the place) that encapsulates some of the concepts and workflows you're describing? What would be the best learning site/resource for arriving at understanding how to integrate and manipulate SD with precision like that? Thanks

bavell|2 years ago

ComfyUI is a nice complement to A1111, the node-based editor is great for prototyping and saving workflows.

famouswaffles|2 years ago

You're not going to get even close to Midjourney or even Bing quality on SD without finetuning. It's that simple. When you do finetune, it will be restricted to that aesthetic and you won't get the same prompt understanding or adherence.

For all the promise of control and customization SD boasts, Midjourney beats it hands down in sheer quality. There's a reason like 99% of ai art comic creators stick to Midjourney despite the control handicap.

orbital-decay|2 years ago

Yet you are posting this in a thread where GP provided actual examples of the opposite. Look for another comment above/below, there are MJ-generated samples which are comparable but also less coherent than the result from a much smaller SD model. And in case of MJ hallucinations cannot be fixed. MJ is good but it isn't magic, it just provides quick results with little experience required; prompt understanding is still poor, and will stay poor until it's paired with a good LLM.

Neither of the existing models gives actually passable production-quality results, be it MJ or SD or whatever else. It will be quite some time until they get out of the uncanny valley.

> There's a reason like 99% of ai art comic creators stick to Midjourney

They aren't. MJ is mostly used by people without experience, think a journalist who needs a picture for an article. Which is great and it's what makes them good money.

As a matter of fact (I work with artists), for all the surface-visible hate AI art gets in the artist community, many actual artists are using it more and more to automate certain mundane parts of their job to save time, and this is not MJ or Dall-E.

chankstein38|2 years ago

I feel like people shouldn't talk in definitives if their message is just going to demonstrate they have no idea what they're talking about.

SV_BubbleTime|2 years ago

You load a model and have 6 sliders instead of one… it’s not exactly “fine tuning”.

If you want the power, it’s there. But nearly bone stock SD in auto1111 is going to get to any of these examples easily.

Show me the civitai equivalent for MJ or Dalle2. It doesn’t exist.

zirgs|2 years ago

Midjourney has a riduculously restrictive keyword filter. You should have mentioned that.

Also I see nothing wrong with using different models for different purposes.

capybara_2020|2 years ago

First off are you using a custom model or the default SD model? The default model is not the greatest. Have you tried controlnet?

But yes SD can be a bit of a pain to use. Think of it like this. SD = Linux, Midjourney = Windows/MacOS. SD is more powerful and user controllable but that also means it has a steeper learning curve.