(no title)
adeptima | 10 months ago
However if you need a full-text search similar to Apache Lucene, my go-to options are based on Tantivy
Tantivy https://github.com/quickwit-oss/tantivy
Asian language, BM25 scoring, Natural query language, JSON fields indexing support are all must-have features for me
Quickwit - https://github.com/quickwit-oss/quickwit - https://quickwit.io/docs/get-started/quickstart
ParadeDB - https://github.com/paradedb/paradedb
I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).
Any thoughts on up-to-date hybrid search experience are greatly appreciated
jitl|10 months ago
https://quickwit.io/blog/quickwit-joins-datadog#the-journey-...
iambateman|10 months ago
The latest version is stable and fast enough, that I think this won't be an issue for a while. It's the kind of thing that does what it needs to do, at least for me.
But I totally agree that the project is at risk, given the acquisition.
kk3|10 months ago
I haven't tried those features but I did try Meilisearch awhile back and I found Typesense to index much faster (which was a bottleneck for my particular use case) and also have many more features to control search/ranking. Although just to say, my use case was not typical for search and I'm sure Meilisearch has come a long way since then, so this is not to speak poorly of Meilisearch, just that Typesense is another great option.
Kerollmops|10 months ago
The main advantage of Meilisearch is that the content is written to disk. Rebooting an instance is instant, and that's quite useful when booting from a snapshot or upgrading to a smaller or larger machine. We think disk-first is a great approach as the user doesn't fear reindexing when restarting the program.
That's where Meilisearch's dumpless upgrade is excellent: all the content you've previously indexed is still written to disk and slightly modified to be compatible with the latest engine version. This differs from Typesense, where upgrades necessitate reindexing the documents in memory. I don't know about embeddings. Do you have to query OpenAI again when upgrading? Meilisearch keeps the embeddings on disk to avoid costs and remove the indexing time.
[1]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1... [2]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...
irevoire|10 months ago
I tried to explain them in an issue that in this state it was pretty much useless because you would always have one or the other search strategy that would give you awful results, but they basically said « some other engine are doing that as well so we won’t try to improve it » + a ton a justification instead of just admitting that this strategy is bad.
jimmydoe|10 months ago
inertiatic|10 months ago
Start off with ES or Vespa, probably. ES is not hard at all to get started with, IMO.
Try RRF - see how far that gets you for your use case. If it's not where you want to be, time to get thinking about what you're trying to do. Maybe a score multiplication gets you where you want to be - you can do it in Vespa I think, but you have to hack around the inability to express exactly that in ES.
andreer|10 months ago
[deleted]
navaed01|10 months ago
Kerollmops|10 months ago
Epicism|10 months ago
It’s based off of the data fusion engine, has vector indexing and BM 25 indexing, has pipes on and rust bindings
Kerollmops|10 months ago
You know that Meilisearch is the way to go, right? Tantivy, even though, I love the product, doesn't support vector search. Its Hybrid search is stunningly good. You can try it on our demo [1].
[1]: https://wheretowatch.meilisearch.com/
oulipo|10 months ago