top | item 43682013

(no title)

adeptima | 10 months ago

Meilisearch is great, used it for a quick demo

However if you need a full-text search similar to Apache Lucene, my go-to options are based on Tantivy

Tantivy https://github.com/quickwit-oss/tantivy

Asian language, BM25 scoring, Natural query language, JSON fields indexing support are all must-have features for me

Quickwit - https://github.com/quickwit-oss/quickwit - https://quickwit.io/docs/get-started/quickstart

ParadeDB - https://github.com/paradedb/paradedb

I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).

Any thoughts on up-to-date hybrid search experience are greatly appreciated

discuss

jitl|10 months ago

Quickwit was bought by Datadog, so I feel there's some risk quickwit-oss becomes unmaintained if Datadog's corporate priority shifts in the future, or OSS maintenance stops providing return on investment. Based on the Quickwit blog post, they are relicensing to Apache2 and releasing some enterprise features, so it seems very possible the original maintainers will move to other things, and it's unclear if enough community would coalesce to keep the project moving forward.

https://quickwit.io/blog/quickwit-joins-datadog#the-journey-...

iambateman|10 months ago

I have an implementation of Quickwit, so I've thought about this.

The latest version is stable and fast enough, that I think this won't be an issue for a while. It's the kind of thing that does what it needs to do, at least for me.

But I totally agree that the project is at risk, given the acquisition.

kk3|10 months ago

As far as combining full-text search with embedding vectors goes, Typesense has been building features around that - https://typesense.org/docs/28.0/api/vector-search.html

I haven't tried those features but I did try Meilisearch awhile back and I found Typesense to index much faster (which was a bottleneck for my particular use case) and also have many more features to control search/ranking. Although just to say, my use case was not typical for search and I'm sure Meilisearch has come a long way since then, so this is not to speak poorly of Meilisearch, just that Typesense is another great option.

Kerollmops|10 months ago

Meilisearch just improved the indexing speed and simplified the update path. We released v1.12 and highly improved indexing speed [1]. We improved the upgrade path with the dumpless upgrade feature [2].

The main advantage of Meilisearch is that the content is written to disk. Rebooting an instance is instant, and that's quite useful when booting from a snapshot or upgrading to a smaller or larger machine. We think disk-first is a great approach as the user doesn't fear reindexing when restarting the program.

That's where Meilisearch's dumpless upgrade is excellent: all the content you've previously indexed is still written to disk and slightly modified to be compatible with the latest engine version. This differs from Typesense, where upgrades necessitate reindexing the documents in memory. I don't know about embeddings. Do you have to query OpenAI again when upgrading? Meilisearch keeps the embeddings on disk to avoid costs and remove the indexing time.

[1]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1... [2]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...

irevoire|10 months ago

I hate the way typesense are doing their « hybrid search ». It’s called fusion search and the idea is that you have no idea of how well the semantic and full text search are being doing, so you’re going to randomly mix them together without looking at all at the results both searches are returning.

I tried to explain them in an issue that in this state it was pretty much useless because you would always have one or the other search strategy that would give you awful results, but they basically said « some other engine are doing that as well so we won’t try to improve it » + a ton a justification instead of just admitting that this strategy is bad.

jimmydoe|10 months ago

+1 typesense is really fast. the only drawback is starting up is slow when index getting larger. the good thing is full text search (excl vector) is relatively stable feature set, so if your use case is just FTS, you won't need to restart very often for version upgrade.

inertiatic|10 months ago

>I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).

Start off with ES or Vespa, probably. ES is not hard at all to get started with, IMO.

Try RRF - see how far that gets you for your use case. If it's not where you want to be, time to get thinking about what you're trying to do. Maybe a score multiplication gets you where you want to be - you can do it in Vespa I think, but you have to hack around the inability to express exactly that in ES.

andreer|10 months ago

[deleted]

navaed01|10 months ago

I’m using Typesense hybrid search, it does the job, well priced and is low-effort to implement. Feel free to ask any specific questions

Kerollmops|10 months ago

You should try Meilisearch then, you'll be astonished by the quality of the results and the ease of setup.

Epicism|10 months ago

Try LanceDB https://github.com/lancedb/lancedb

It’s based off of the data fusion engine, has vector indexing and BM 25 indexing, has pipes on and rust bindings

Kerollmops|10 months ago

> I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).

You know that Meilisearch is the way to go, right? Tantivy, even though, I love the product, doesn't support vector search. Its Hybrid search is stunningly good. You can try it on our demo [1].

[1]: https://wheretowatch.meilisearch.com/

oulipo|10 months ago

why couldn't it be possible to just embed Meilisearch/Tantivy/Quickwit inside Postgres as a plugin to simplify the setup?