top | item 28552035

(no title)

cyrusthegreat | 4 years ago

Hi everyone!

Over the years, I've found myself building hacky solutions to serve and manage my embeddings. I’m excited to share Embeddinghub, an open-source vector database for ML embeddings. It is built with four goals in mind:

Store embeddings durably and with high availability

Allow for approximate nearest neighbor operations

Enable other operations like partitioning, sub-indices, and averaging

Manage versioning, access control, and rollbacks painlessly

It's still in the early stages, and before we committed more dev time to it we wanted to get your feedback. Let us know what you think and what you'd like to see!

Repo: https://github.com/featureform/embeddinghub

Docs: https://docs.featureform.com/

What's an Embedding? The Definitive Guide to Embeddings: https://www.featureform.com/post/the-definitive-guide-to-emb...

discuss

order

ypcx|4 years ago

In the "Definitive Guide to Embeddings", in the figure "An illustration of One Hot Encoding", the "One Hot Encoding" table doesn't make any sense whatsoever. Am I wrong?

make3|4 years ago

no you're right ahahah wth are these

JPKab|4 years ago

Holy shit, this looks amazing!

I see you've got examples for NLP use cases in your docs. Can't wait to read them. Embeddings are a constant source of complexity when I'm trying to move certain operations to Lambda, this looks like it would speed the initializations up big time.

localhost|4 years ago

Curious about how your solution is different / better than nmslib which I've tried in the past?

cyrusthegreat|4 years ago

We actually use HNSWLIB by NMSLIB on the backend. NMSLIB is solving the approximate nearest neighbor problem, not the storage problem. It’s not a database, it’s an index. We handle everything needed to turn their index into a full fledged database with a data science workflow around it (versioning, monitoring, etc.)