top | item 37205430

(no title)

jabo | 2 years ago

I work on Typesense [1] - historically considered an open source alternative to Algolia.

We then launched vector search in Jan 2023, and just last week we launched the ability to generate embeddings from within Typesense.

You'd just need to send JSON data, and Typesense can generate embeddings for your data using OpenAI, PaLM API, or built-in models like S-BERT, E-5, etc (running on a GPU if you prefer) [2]

You can then do a hybrid (keyword + semantic) search by just sending the search keywords to Typesense, and Typesense will automatically generate embeddings for you internally and return a ranked list of keyword results weaved with semantic results (using Rank Fusion).

You can also combine filtering, faceting, typo tolerance, etc - the things Typesense already had - with semantic search.

For context, we serve over 1.3B searches per month on Typesense Cloud [3]

[1] https://github.com/typesense/typesense

[2] https://typesense.org/docs/0.25.0/api/vector-search.html

[3] https://cloud.typesense.org

discuss

Dachande663|2 years ago

We store a couple million documents in typesense and the vector store is performing great so far (average search time is a fraction of overall RAG time). Didn’t realise you’ve updated to support creating the embeddings automatically; great news!

ZoomerCretin|2 years ago

This is very difficult for me to understand. Can you explain like I'm an undergrad? What exactly does this mean? What is an embedding? What is the difference between keyword and semantic search?

jabo|2 years ago

Here's an example of semantic search:

Let's say your dataset has the words "Oceans are blue" in it.

With keyword search, if someone searches for "Ocean", they'll see that record, since it's a close match. But if they search for "sea" then that record won't be returned.

This is where semantic search comes in. It can automatically deduce semantic / conceptual relationships between words and return a record with "Ocean" even if the search term is "sea", because the two words are conceptually related.

The way semantic search works under the hood is using these things called embeddings, which are just a big array of floating point numbers for each record. It's an alternate way to represent words, in an N-dimensional space created by a machine learning model. Here's more information about embeddings: https://typesense.org/docs/0.25.0/api/vector-search.html#wha...

With the latest release, you essentially don't have to worry about embeddings (except may be picking one of the model names to use and experiment) and Typesense will do the semantic search for you by generating embeddings automatically.

mrjn|2 years ago

We use Typesense for vector search as well for Struct.ai in production, it works amazingly.

I'm surprised the original post doesn't benchmark Typesense.