As a serial DIYer, I respect the engineering depth here, especially the custom vector index, but I disagree on the self-hosted ML approach. The innovation in embeddings is just too fast to keep up with locally without constant refactoring. You can actually see the trade-off in the "girl drinking water" example where one result is a clear hallucination.
warangal|3 months ago
For point about "girl drinking water", "girl" is the person/tagged name , "drinking water" is just re-ranking all of "girl"s photos ! (Rather than finding all photos of a (generic) girl drinking water) .
I have been more focussed on making indexing pipeline more peformant by reducing copies, speeding up bottleneck portions by writing in Nim. Fusion of semantic features with meta-data is more interesting and challenging part, in comparison to choosing an embedding model !