top | item 36694549

(no title)

kacperlukawski | 2 years ago

If you need semantic search locally then it's fine, but serving an embedding model might be still challenging. And if you want to expose it, your laptop might be not enough.

discuss

order

hn_20591249|2 years ago

I've hosted embedding models on AWS Lambda (fair that this is a vendor, but 1 vs. 3), if you try an LLM with 1B+ parameters you will struggle, but if the difference between a light-weight BERT-like transformer and an LLM is only a few % of loss, why bother getting your credit card out?

Edit: another thought, skip lambda entirely and run the embedding job on the server as a background process, and use an on-disk vector store (lancedb)

binarymax|2 years ago

Shameless plug: I built Mighty Inference Server to solve this problem. Fast embeddings with minimal footprint. Better BEIR and MTEB scores using the lightning fast and small E5 V2 models. Scales linearly on CPU, no GPU needed.

https://max.io

llogiq|2 years ago

The initial version of this actually used Mighty, but I didn't find any free tier available, so I switched to Cohere to keep the $0 pricetag.

bootsmann|2 years ago

You serve the embedding model in a lambda and then run something like FAISS in the backend.