Show HN: Stock Photos Using Stable Diffusion

[+] nostromo|3 years ago|reply

Some of these are a bit nightmarish!

https://replicate.com/api/models/stability-ai/stable-diffusi...

I love it. But whoever entered "spider salad" and "cockroach salad" previously so it would show up when I searched for "salad" -- I'm mad at you.

[+] ericmcer|3 years ago|reply

Yeah I would love a university to use this on their website. https://replicate.com/api/models/stability-ai/stable-diffusi...

[+] jarrenae|3 years ago|reply

In V2 we're planning to add a voting system and additional filtering/tagging to solve for a lot of these unusual/nightmareish summoned images.

I for one am sorry for your cockroach salad jump-scare, but of course, you know summoning from beyond is tricky business.

[+] lyjackal|3 years ago|reply

I searched for “happy” and I tend to agree with the nightmarish look. Pretty much all of the results looked like they would love to use their happy teeth to eat you. Consistently hitting the uncanny valley

[+] mrtksn|3 years ago|reply

Sometimes can be cute: https://dropover.cloud/d6b973#841f5cc5-f92f-44fe-93ec-9e9b9d...

The uncanny valley is real but I guess these issues can be fixed with a "face sanity engine".

"Limbs sanity engine" can help too: https://dropover.cloud/8bf440#ce54ae49-bb16-4ce9-97d1-71b3b8...

[+] q7xvh97o2pDhNrh|3 years ago|reply

Gives a whole new meaning to "debugging."

[+] apsdsm|3 years ago|reply

This might just be the best stock photo site for YouTube creepypasta videos.

[+] roganp|3 years ago|reply

Oh yes. Some are very creepy / hilarious. Awesome just the same.

[+] selcuka|3 years ago|reply

No wonder they call it ghostly.

[+] thefilmore|3 years ago|reply

I'm seeing images with a dreamstime[0] watermark when searching "man in suit":

https://replicate.com/api/models/stability-ai/stable-diffusi...

[0]: https://www.dreamstime.com

[+] an1sotropy|3 years ago|reply

Nice find! So it was trained with dreamstime images.

Do the output images come with licensing and copyright images, so that dreamstime can be compensated for downstream commercial use?

What a legal mess.

[+] an1sotropy|3 years ago|reply

Found some more with "team discussion in well-lit office" (trying to think of most corporate possible phrasing)

https://replicate.com/api/models/stability-ai/stable-diffusi...

"team discussion in large office"

https://replicate.com/api/models/stability-ai/stable-diffusi...

[+] rany_|3 years ago|reply

I'm surprised it didn't mangle their watermark. It's extremely clear!

[+] CameronBanga|3 years ago|reply

Having a button in the search bar that's a blue circle that says Photo, etc, and then not having it start the generation process when clicked feels odd to me. Took me about 30 seconds to realize I had to hit the enter key. Would likely feel weirder on mobile.

[+] jarrenae|3 years ago|reply

Agreed. Mobile we have an added "Search" button appear, but that's on my list of improvements to make.

[+] yamtaddle|3 years ago|reply

UX suggestion: example search already performed on the landing page. You can fake it a bit so it's not actually hitting your search logic (and incurring that cost) every time. Just so when you arrive you see the sort of thing a search might return.

[EDIT] Actually instead of dropping straight into the actual search-result UI, how about scrunching the header up a tad more (there's already a bunch of incomplete-looking space under it) and a row of example images with example searches that might bring them up:

    [ Image ]       [ Image ]      [ Image ]
    "Cats playing    "The moon,    "Statue of
     baseball"        made of       liberty
                      cheese"       driving a car"

[+] an1sotropy|3 years ago|reply

None of the text-to-image tools seem to really understand 3D geometry, so I feel safe for now. Look at examples for icosahedron [1] vs dodecahedron [2] vs octahedron [3] None of the images were actually geometrically correct - is that quibbling? Maybe, but sometimes for some audience words actually mean something, not just some vague evocation of the angular aesthetic of something. Has someone delineated the words that will not appear in a stock photography prompt? If there was some feedback like "I'm confident in this" to "I'm guessing here, user beware", it would be a lot more usable.

[1] https://replicate.com/api/models/stability-ai/stable-diffusi...

[2] https://replicate.com/api/models/stability-ai/stable-diffusi...

[3] https://replicate.com/api/models/stability-ai/stable-diffusi...

[+] jarrenae|3 years ago|reply

That's one of the things I've found deeply interesting about the current generation of tools, there's little (if any) comprehension going on, it's really just trying to "enhance" a blur/bit of noise to make the image it was told to make.

And I'm not sure I completely know what you mean, but we are planning to add voting and tagging to improve filtering for images.

[+] londons_explore|3 years ago|reply

You saw this right:

https://dreamfusion3d.github.io/

That's the same type of diffusion model used here, and without any further training, it is constrained to generate something that is consistent from all angles when viewed in 3d.

[+] barbazoo|3 years ago|reply

Not sure I understand how to use this. I searched for "monkey on car" and these are the "categories" I get:

"a dead monkey", "a monkey dancing", "a dead monkey" (again), "a ca"

[+] roganp|3 years ago|reply

They are offering you a previously generated image. Need to click the button at bottom of page to get an original rendering "from beyond"

[+] jarrenae|3 years ago|reply

Also we'll have to add reporting for specific search terms. We do have a NSFW filter on by default, but there are often things that skirt around the rules while are hard to filter for.

[+] gus_massa|3 years ago|reply

They take too long to generate, but there is no clear indication of that. You should add a spinning mouse or other thing that shows that the server is working. (A robot paining a canvas would be nice, but you need someone that can make nice drawings. A hourglass or a spinning circle are good enough.)

[+] jarrenae|3 years ago|reply

Agreed. That's already one of the things I have on the list for v2, "make image summoning more obvious/loading" and also we'll improve the button location for "Summoning new images" because it's likely that users won't want to scroll to the bottom just to generate new images.

[+] knicholes|3 years ago|reply

Dall-E 2 does something great: Show prompts and examples of images that those prompts generate. This educates your consumer to be able to get more of what they want while they wait. It kind of tickles the desire for mastery.

[+] smeej|3 years ago|reply

It's fascinating how much AI struggles to mimic signs and text. With as much as we enter text into computers, my instinct was to think this should be really easy for computers, but they don't actually receive and process the abstraction of writing like we do, do they?

We use shapes to indicate sounds and sequences to make words, but the computer is ultimately just getting 1 or 0, on or off. It doesn't seem that it does have the associations we use intuitively because of how humans interact with language.

[+] wodenokoto|3 years ago|reply

This is awesome. I see this coming builtin to power point.

Cellist eating a donut is super freakish!

https://replicate.com/api/models/stability-ai/stable-diffusi...

[+] MikeDelta|3 years ago|reply

Looks like a donut eating a cellist.

[+] dillondoyle|3 years ago|reply

The suggested search results are amazing in such a ridiculous way.

"paper" produced "a man reading a newspaper while riding a walrus"

"a wolf reading a newspaper"

"Trapped inside infinity"

and I got to say, the wawlrus readers look passable at a glance when shrunk to low res

https://replicate.com/api/models/stability-ai/stable-diffusi...

[+] onwardly|3 years ago|reply

I imagine that is a short term problem.

[+] rany_|3 years ago|reply

It still has trouble understanding sentences, it feels to me that it just generates images based on keywords and not the meaning of my sentence.

For example, I tried "attractive woman disgusted by an ugly bystander" and the generated images show a disgusted woman with no "ugly bystander".

Similar situation with "man angry at a squirrel seeks revenge" (generated image shows an angry squirrel with no man in the image, when the man was the one supposed to be angry..)

[+] andybak|3 years ago|reply

This is the biggest difference between SD and Dall-E (and Imagen) to my mind. SD can produce stunning results but it tends to treat prompts as "word salad" rather than a grammatical instruction.

[+] krick|3 years ago|reply

Not sure how to evaluate that. Maybe it's kinda fun, but… I mean, generating crappy images from text isn't exactly new by now. It may be "an early version" (and this is exactly why I struggle to evaluate that — obviously, we shouldn't be too judgemental of "an early version"), but it surely isn't "a truly functional stock photo platform" yet. I mean, by far. "By a light-year" kind of far.

[+] jarrenae|3 years ago|reply

This is definitely a fair assessment. I think a lot of the "wow" factor is just seeing the generated images in the first place.

In truth I think a lot of value will be added as we start improving filtering. Once users are able to vote on "usable" or "unusable" images, or request variations of an existing photo.

I've genuinely used it for 3-4 photos where I would have previously used Unsplash, and I'm optimistic that I can get that number to steadily trend upwards.

I don't expect this to erase any of the existing stock photo tools on the market, though I do think this will add some new value to the space. Honestly my goal was "will my mom be able to use this?"

Hope that helps clarify the goal a bit more, and I do really appreciate the feedback!

[+] Kye|3 years ago|reply

stargate sg-1 at a furry convention:

https://replicate.com/api/models/stability-ai/stable-diffusi...

https://replicate.com/api/models/stability-ai/stable-diffusi... (cool suit)

https://replicate.com/api/models/stability-ai/stable-diffusi...

[+] agluszak|3 years ago|reply

Usability note: please add a clickable "search" button.

[+] ericmcer|3 years ago|reply

Whoa this is cool and I would def used a more refined version of it. The images with people are a little bit... freaky but objects and animals look fine.

I wonder if this exists inside of Squarespace or Wordpress. I imagine the ability to generate quality license free stock photos would be a huge selling point for them.

[+] jarrenae|3 years ago|reply

We're going to add voting to help empower users to sort between better/worse summoned images. And an API tool for devs to leverage is planned as well.

[+] pimlottc|3 years ago|reply

The animals definitely do not look fine to me, all the results for "cat" I saw were pretty squarely in the uncanny valley.

[+] bscphil|3 years ago|reply

It's sort of interesting, given the undeniable power that these new AI techniques have, just how limited the output is at the moment. Only 512x512 images.

I tried a specific query - "man running from a tiger" - and none of the provided images were even close. Seems to be a common problem.

[+] jtxt|3 years ago|reply

I really like this idea! Related results work fairly well. Tons of potential here!

Ideas: Allow voting for prompts. Allow voting for results. (But try to prevent the rich get richer effect... https://medium.com/hacking-and-gonzo/how-hacker-news-ranking...) Allow requesting more results for a given prompt.

bug: When there is an error, make it so "back" goes to before the error, instead of before I went to the website perhaps?

[+] switchstance|3 years ago|reply

I am so thankful we got out of the stock business when we did.

AI generated photos, videos, music and animations are here, and I believe it's only a matter of time before they replace a large percentage of the stock websites/companies.

[+] jarrenae|3 years ago|reply

That's sort of the reason we started building this. I think there will absolutely always be room for paid, high quality stock photos, but "content" at the speed of thought is here, and I'm excited to see how the space evolves.

[+] kaetemi|3 years ago|reply

The suggested tags when searching "anime girl" are just a bit creepy.

[+] wheresmycraisin|3 years ago|reply

Omg I cannot wait for human faces to become non-freaky with this technology. People pay real money to sites like Getty or Adobe (the former of which is owned by a corp that you may or may not find politically compatible with your beliefs) to fill their landing pages. And for specific categories, for example "happy asian couple", there's only a few models to choose from so it becomes repetitive fast.

[+] jmcphers|3 years ago|reply

If you need a non-freaky human face generated by AI, look no further than: https://thispersondoesnotexist.com/

[+] jarrenae|3 years ago|reply

I can't wait either. We're going to add follow-up solutions to upres, expand, and improve facial features. Additionally, we're aiming to improve search terminology on the back end to start providing more relevant results for exactly those sorts of searches.

105 comments