Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust
60 points| Tananon | 9 months ago |github.com
Main Features:
- Rust-native inference: Load any Model2Vec model from Hugging Face or your local path with StaticModel::from_pretrained(...).
- Tiny footprint: The crate itself is only ~1.7 mb, with embedding models between 7 and 30 mb.
Performance:
We benchmarked single-threaded on a CPU:
- Python: ~4650 embeddings/sec
- Rust: ~8000 embeddings/sec (~1.7× speedup)
First open-source project in Rust for us, so would be great to get some feedback!
gthompson512|9 months ago
Edit: it seems like it just splits in to sentences which is a weird thing to do given in English only 95%ish percent agreement is even possible on what a sentence is. ``` // Process in batches for batch in sentences.chunks(batch_size) { // Truncate each sentence to max_length * median_token_length chars let truncated: Vec<&str> = batch .iter() .map(|text| { if let Some(max_tok) = max_length { Self::truncate_str(text, max_tok, self.median_token_length) } else { text.as_str() } }) .collect(); ```
gthompson512|9 months ago
Edit: That wasn't intended to be mean, although it may come off that way, but what is this supposed to be for? Myself I have text >8k tokens that need to be embedded and test things regularly.
noahbp|9 months ago
For someone looking to build a large embedding search, fast static embeddings seem like a good deal, but almost too good to be true. What quality tradeoff are you seeing with these models versus embedding models with attention mechanisms?
Tananon|9 months ago
There's definitely a quality trade-off. We have extensive benchmarks here: https://github.com/MinishLab/model2vec/blob/main/results/REA.... potion-base-32M reaches ~92% of the performance of MiniLM while being much faster (about 70x faster on CPU). It depends a bit on your constraints: if you have limited hardware and very high throughput, these models will allow you to still make decent quality embeddings, but ofcourse an attention based model will be better, but more expensive.
echelon|9 months ago
We've been using Candle and Cudarc and having a fairly good time of it. We've built a real time drawing app on a custom LCM stack, and Rust makes it feel rock solid. Python is way too flimsy for something like this.
The more the Rust ML ecosystem grows, the better. It's a little bit fledgling right now, so every little bit counts.
If llama.cpp had instead been llama.rs, I feel like we would have had a runaway success.
We'll be checking this out! Kudos, and keep it up!
Tananon|9 months ago
Havoc|9 months ago
Tananon|9 months ago
badmonster|9 months ago
Tananon|9 months ago