Last year we added CLIP-based image search to https://immich.app/ and even though I have a pretty good understanding of how it works, it still blows my mind damn near every day. It's the closest thing to magic I've ever seen.
Happy immich user here! I once took a cute photo of our baby
chewing on a whisk, and actually finding the correct photo in an unsorted, untagged huge pile of photos by simply searching for "whisk" was a mindblow experience! It is an amazingly powerful tool!
Native app. Doesn't require a network connection (great for privacy).
> Queryable is a Core ML model that runs locally on your device. Leveraging OpenAI CLIP's model encoding technology to connect images and text, you can search your iPhone photo album using any natural language input. Most importantly, it is completely offline, so your album privacy will not be revealed to anyone. And, it is open-source: GitHub
After creating Queryable, I also developed an app called MemeSearch, which searches for memes on Reddit (https://apps.apple.com/us/app/memesearch-reddit-meme-finder/...). Although it's completely free, it hasn't been downloaded by many users. I thought nobody wanted it, so I'm glad to see there are still some people who share a similar taste.
Gives me an idea for a meme search service I can use locally to search through all the images on my computer to find a specific meme (I tend to know I downloaded a funny one and then when I want to share it with someone I can never find it)
Huh, are the image vector embeddings implicitly doing OCR as well? Because it seems like the meme search is pulling from the text as well as images, though it's not entirely clear.
CLIP does not have explicit OCR support, but it does somewhat coincidentally have a slight understanding of text. This is explained by training captions containing (some of) the text that is in the image.
These hacks/side projects are amazing! I feel we will see a lot of creativity as tools to build data intensive AI applications become easier.
We built and open sourced Indexify https://github.com/tensorlakeai/indexify to make it easy to build resilient pipelines to combine data with many different models and transformations to build applications that relies on embedding or any other metadata extracted by models from Videos, Photos and any documents!
I didn’t know about SigClip, the author mentioned on the blog, need to add this to our library :) I also found it incredible that he generated the crawler with Claude! This is the type of boilerplate I hope we don’t have to write in the future
This is awesome! We made similar functionality (plus more) available through an API. If anyone is interested to try it out and share feedback, please message me and I’ll hook you up.
I steered a friend towards Paperless (and away from an LLM solution) as a way of searching/accessing GBs of architectural PDFs recently - so far, it’s apparently working well for them.
Hi @rmdes,
Sagar here from Joyspace AI. I recently made a Show HN post[0] around documents search engine.
We can do this very easily for you. We can provide Search output with context that you can further feed to an LLM for processing to extract events. Let me know if you are interested.
You can get in touch with me at sagar at joyspace dot ai.
We're almost getting back to the .com era of the 2000's with some of these "public cloud" company demos. Enough frenzy, that if your app really starts grinding compute cycles you can quickly DDOS yourself with server costs. Even at $0.001/request [1], if you get 10,000 HN readers who all make 100 requests on average, you suddenly get $1000 server bill from somebody. Those used to be on /. all the time circa 2000.
If few convert, and most just tell their friends to try your cool demo, you can suddenly have 100,000 reddit users making 200+ requests on average every day cause your free demo's so cool. And suddenly you're mostly trying to figure out how to scrounge up server costs to cover the free parts.
Frankly, seems like the entire industry's probably going to have a lot of the same optimizations pretty soon. "How do we stop delivering such enormous JPGs with every Amazon/eBay click?" and similar.
> I imagine that we will see this tech rolled into all the various photo apps shortly.
Yeah, Google's and Apple's Photos both can search for pictures given a description of what you're looking for. In my experience both work very well (e.g. search for "cars" in your pics, and it'll find all your cars over the years if you, like me, take pictures with your cars a lot :) ).
bo0tzz|1 year ago
dsvf|1 year ago
apricot13|1 year ago
fxtentacle|1 year ago
atif089|1 year ago
Amongst all the WhatsApp media on my phone I would like to get a list of all the videos and photos with my family in it and then delete the rest.
Is something like this possible with immich?
wruza|1 year ago
rovr138|1 year ago
I run it on my iPhone.
Native app. Doesn't require a network connection (great for privacy).
> Queryable is a Core ML model that runs locally on your device. Leveraging OpenAI CLIP's model encoding technology to connect images and text, you can search your iPhone photo album using any natural language input. Most importantly, it is completely offline, so your album privacy will not be revealed to anyone. And, it is open-source: GitHub
mazzystar|1 year ago
After creating Queryable, I also developed an app called MemeSearch, which searches for memes on Reddit (https://apps.apple.com/us/app/memesearch-reddit-meme-finder/...). Although it's completely free, it hasn't been downloaded by many users. I thought nobody wanted it, so I'm glad to see there are still some people who share a similar taste.
yreg|1 year ago
https://findthatmeme.com/blog/2023/01/08/image-stacks-and-ip...
https://news.ycombinator.com/item?id=34315782
thesz|1 year ago
[1] https://www.reddit.com/r/AskReddit/comments/jooo5/reddit_ori...
harper|1 year ago
black_puppydog|1 year ago
robotnikman|1 year ago
ianbicking|1 year ago
bo0tzz|1 year ago
speedgoose|1 year ago
gardenhedge|1 year ago
diptanu|1 year ago
We built and open sourced Indexify https://github.com/tensorlakeai/indexify to make it easy to build resilient pipelines to combine data with many different models and transformations to build applications that relies on embedding or any other metadata extracted by models from Videos, Photos and any documents!
I didn’t know about SigClip, the author mentioned on the blog, need to add this to our library :) I also found it incredible that he generated the crawler with Claude! This is the type of boilerplate I hope we don’t have to write in the future
om8|1 year ago
On my previous job ML department created internal tool, where you could search through city panoramas (like google street view) using text.
It could find you in a second all road pits, overfilled dumpsters and other ugly (and beautiful) things you wanted.
systemz|1 year ago
lancehasson|1 year ago
harper|1 year ago
justinator|1 year ago
harper|1 year ago
Fwiw, my recent blog is me trying to do this more
rmdes|1 year ago
mft_|1 year ago
https://github.com/paperless-ngx/paperless-ngx
sagar-co|1 year ago
We can do this very easily for you. We can provide Search output with context that you can further feed to an LLM for processing to extract events. Let me know if you are interested.
You can get in touch with me at sagar at joyspace dot ai.
[0] https://news.ycombinator.com/item?id=39980902
unknown|1 year ago
[deleted]
harper|1 year ago
ritavdas|1 year ago
araes|1 year ago
If few convert, and most just tell their friends to try your cool demo, you can suddenly have 100,000 reddit users making 200+ requests on average every day cause your free demo's so cool. And suddenly you're mostly trying to figure out how to scrounge up server costs to cover the free parts.
Frankly, seems like the entire industry's probably going to have a lot of the same optimizations pretty soon. "How do we stop delivering such enormous JPGs with every Amazon/eBay click?" and similar.
[1] Slighly old article, so I lower the $/request on compute a bit from $0.0014 to $0.001. https://a16z.com/navigating-the-high-cost-of-ai-compute/
harper|1 year ago
brabel|1 year ago
Yeah, Google's and Apple's Photos both can search for pictures given a description of what you're looking for. In my experience both work very well (e.g. search for "cars" in your pics, and it'll find all your cars over the years if you, like me, take pictures with your cars a lot :) ).
pksebben|1 year ago
I clicked through to your sites 'cause I dig your angle and I saw the bit about the kindle. Ouch, dude. Money sure ain't everything but holy crap.
You have my condolences. Keep building awesome shit, please.
edit: followup question - do you still have it?
harper|1 year ago
lanej|1 year ago
saYu|1 year ago
[deleted]