top | item 33038117

Show HN: Stock Photos Using Stable Diffusion

190 points| jarrenae | 3 years ago |ghostlystock.com

Hi HN, this is an early version of what we’re imagining as a truly functional stock photo platform using Stable Diffusion.

We’re doing our best to hide the customization prompts on the back end so users are able to quickly search for pre-existing generated photos, or create new ones that would ideally work as well.

If we keep going with it, in future versions we’d like to add voting, better tags, and more varied prompts, or maybe whatever you recommend!

105 comments

order
[+] nostromo|3 years ago|reply
[+] jarrenae|3 years ago|reply
In V2 we're planning to add a voting system and additional filtering/tagging to solve for a lot of these unusual/nightmareish summoned images.

I for one am sorry for your cockroach salad jump-scare, but of course, you know summoning from beyond is tricky business.

[+] lyjackal|3 years ago|reply
I searched for “happy” and I tend to agree with the nightmarish look. Pretty much all of the results looked like they would love to use their happy teeth to eat you. Consistently hitting the uncanny valley
[+] apsdsm|3 years ago|reply
This might just be the best stock photo site for YouTube creepypasta videos.
[+] roganp|3 years ago|reply
Oh yes. Some are very creepy / hilarious. Awesome just the same.
[+] selcuka|3 years ago|reply
No wonder they call it ghostly.
[+] thefilmore|3 years ago|reply
[+] an1sotropy|3 years ago|reply
Nice find! So it was trained with dreamstime images.

Do the output images come with licensing and copyright images, so that dreamstime can be compensated for downstream commercial use?

What a legal mess.

[+] rany_|3 years ago|reply
I'm surprised it didn't mangle their watermark. It's extremely clear!
[+] CameronBanga|3 years ago|reply
Having a button in the search bar that's a blue circle that says Photo, etc, and then not having it start the generation process when clicked feels odd to me. Took me about 30 seconds to realize I had to hit the enter key. Would likely feel weirder on mobile.
[+] jarrenae|3 years ago|reply
Agreed. Mobile we have an added "Search" button appear, but that's on my list of improvements to make.
[+] yamtaddle|3 years ago|reply
UX suggestion: example search already performed on the landing page. You can fake it a bit so it's not actually hitting your search logic (and incurring that cost) every time. Just so when you arrive you see the sort of thing a search might return.

[EDIT] Actually instead of dropping straight into the actual search-result UI, how about scrunching the header up a tad more (there's already a bunch of incomplete-looking space under it) and a row of example images with example searches that might bring them up:

    [ Image ]       [ Image ]      [ Image ]
    "Cats playing    "The moon,    "Statue of
     baseball"        made of       liberty
                      cheese"       driving a car"
[+] an1sotropy|3 years ago|reply
None of the text-to-image tools seem to really understand 3D geometry, so I feel safe for now. Look at examples for icosahedron [1] vs dodecahedron [2] vs octahedron [3] None of the images were actually geometrically correct - is that quibbling? Maybe, but sometimes for some audience words actually mean something, not just some vague evocation of the angular aesthetic of something. Has someone delineated the words that will not appear in a stock photography prompt? If there was some feedback like "I'm confident in this" to "I'm guessing here, user beware", it would be a lot more usable.

[1] https://replicate.com/api/models/stability-ai/stable-diffusi...

[2] https://replicate.com/api/models/stability-ai/stable-diffusi...

[3] https://replicate.com/api/models/stability-ai/stable-diffusi...

[+] jarrenae|3 years ago|reply
That's one of the things I've found deeply interesting about the current generation of tools, there's little (if any) comprehension going on, it's really just trying to "enhance" a blur/bit of noise to make the image it was told to make.

And I'm not sure I completely know what you mean, but we are planning to add voting and tagging to improve filtering for images.

[+] londons_explore|3 years ago|reply
You saw this right:

https://dreamfusion3d.github.io/

That's the same type of diffusion model used here, and without any further training, it is constrained to generate something that is consistent from all angles when viewed in 3d.

[+] barbazoo|3 years ago|reply
Not sure I understand how to use this. I searched for "monkey on car" and these are the "categories" I get:

"a dead monkey", "a monkey dancing", "a dead monkey" (again), "a ca"

[+] roganp|3 years ago|reply
They are offering you a previously generated image. Need to click the button at bottom of page to get an original rendering "from beyond"
[+] jarrenae|3 years ago|reply
Also we'll have to add reporting for specific search terms. We do have a NSFW filter on by default, but there are often things that skirt around the rules while are hard to filter for.
[+] gus_massa|3 years ago|reply
They take too long to generate, but there is no clear indication of that. You should add a spinning mouse or other thing that shows that the server is working. (A robot paining a canvas would be nice, but you need someone that can make nice drawings. A hourglass or a spinning circle are good enough.)
[+] jarrenae|3 years ago|reply
Agreed. That's already one of the things I have on the list for v2, "make image summoning more obvious/loading" and also we'll improve the button location for "Summoning new images" because it's likely that users won't want to scroll to the bottom just to generate new images.
[+] knicholes|3 years ago|reply
Dall-E 2 does something great: Show prompts and examples of images that those prompts generate. This educates your consumer to be able to get more of what they want while they wait. It kind of tickles the desire for mastery.
[+] smeej|3 years ago|reply
It's fascinating how much AI struggles to mimic signs and text. With as much as we enter text into computers, my instinct was to think this should be really easy for computers, but they don't actually receive and process the abstraction of writing like we do, do they?

We use shapes to indicate sounds and sequences to make words, but the computer is ultimately just getting 1 or 0, on or off. It doesn't seem that it does have the associations we use intuitively because of how humans interact with language.

[+] dillondoyle|3 years ago|reply
The suggested search results are amazing in such a ridiculous way.

"paper" produced "a man reading a newspaper while riding a walrus"

"a wolf reading a newspaper"

"Trapped inside infinity"

and I got to say, the wawlrus readers look passable at a glance when shrunk to low res

https://replicate.com/api/models/stability-ai/stable-diffusi...

https://replicate.com/api/models/stability-ai/stable-diffusi...

[+] onwardly|3 years ago|reply
I imagine that is a short term problem.
[+] rany_|3 years ago|reply
It still has trouble understanding sentences, it feels to me that it just generates images based on keywords and not the meaning of my sentence.

For example, I tried "attractive woman disgusted by an ugly bystander" and the generated images show a disgusted woman with no "ugly bystander".

Similar situation with "man angry at a squirrel seeks revenge" (generated image shows an angry squirrel with no man in the image, when the man was the one supposed to be angry..)

[+] andybak|3 years ago|reply
This is the biggest difference between SD and Dall-E (and Imagen) to my mind. SD can produce stunning results but it tends to treat prompts as "word salad" rather than a grammatical instruction.
[+] krick|3 years ago|reply
Not sure how to evaluate that. Maybe it's kinda fun, but… I mean, generating crappy images from text isn't exactly new by now. It may be "an early version" (and this is exactly why I struggle to evaluate that — obviously, we shouldn't be too judgemental of "an early version"), but it surely isn't "a truly functional stock photo platform" yet. I mean, by far. "By a light-year" kind of far.
[+] jarrenae|3 years ago|reply
This is definitely a fair assessment. I think a lot of the "wow" factor is just seeing the generated images in the first place.

In truth I think a lot of value will be added as we start improving filtering. Once users are able to vote on "usable" or "unusable" images, or request variations of an existing photo.

I've genuinely used it for 3-4 photos where I would have previously used Unsplash, and I'm optimistic that I can get that number to steadily trend upwards.

I don't expect this to erase any of the existing stock photo tools on the market, though I do think this will add some new value to the space. Honestly my goal was "will my mom be able to use this?"

Hope that helps clarify the goal a bit more, and I do really appreciate the feedback!

[+] agluszak|3 years ago|reply
Usability note: please add a clickable "search" button.
[+] ericmcer|3 years ago|reply
Whoa this is cool and I would def used a more refined version of it. The images with people are a little bit... freaky but objects and animals look fine.

I wonder if this exists inside of Squarespace or Wordpress. I imagine the ability to generate quality license free stock photos would be a huge selling point for them.

[+] jarrenae|3 years ago|reply
We're going to add voting to help empower users to sort between better/worse summoned images. And an API tool for devs to leverage is planned as well.
[+] pimlottc|3 years ago|reply
The animals definitely do not look fine to me, all the results for "cat" I saw were pretty squarely in the uncanny valley.
[+] bscphil|3 years ago|reply
It's sort of interesting, given the undeniable power that these new AI techniques have, just how limited the output is at the moment. Only 512x512 images.

I tried a specific query - "man running from a tiger" - and none of the provided images were even close. Seems to be a common problem.

[+] jtxt|3 years ago|reply
I really like this idea! Related results work fairly well. Tons of potential here!

Ideas: Allow voting for prompts. Allow voting for results. (But try to prevent the rich get richer effect... https://medium.com/hacking-and-gonzo/how-hacker-news-ranking...) Allow requesting more results for a given prompt.

bug: When there is an error, make it so "back" goes to before the error, instead of before I went to the website perhaps?

[+] switchstance|3 years ago|reply
I am so thankful we got out of the stock business when we did.

AI generated photos, videos, music and animations are here, and I believe it's only a matter of time before they replace a large percentage of the stock websites/companies.

[+] jarrenae|3 years ago|reply
That's sort of the reason we started building this. I think there will absolutely always be room for paid, high quality stock photos, but "content" at the speed of thought is here, and I'm excited to see how the space evolves.
[+] kaetemi|3 years ago|reply
The suggested tags when searching "anime girl" are just a bit creepy.
[+] wheresmycraisin|3 years ago|reply
Omg I cannot wait for human faces to become non-freaky with this technology. People pay real money to sites like Getty or Adobe (the former of which is owned by a corp that you may or may not find politically compatible with your beliefs) to fill their landing pages. And for specific categories, for example "happy asian couple", there's only a few models to choose from so it becomes repetitive fast.
[+] jarrenae|3 years ago|reply
I can't wait either. We're going to add follow-up solutions to upres, expand, and improve facial features. Additionally, we're aiming to improve search terminology on the back end to start providing more relevant results for exactly those sorts of searches.