(no title)
jabo | 2 years ago
We then launched vector search in Jan 2023, and just last week we launched the ability to generate embeddings from within Typesense.
You'd just need to send JSON data, and Typesense can generate embeddings for your data using OpenAI, PaLM API, or built-in models like S-BERT, E-5, etc (running on a GPU if you prefer) [2]
You can then do a hybrid (keyword + semantic) search by just sending the search keywords to Typesense, and Typesense will automatically generate embeddings for you internally and return a ranked list of keyword results weaved with semantic results (using Rank Fusion).
You can also combine filtering, faceting, typo tolerance, etc - the things Typesense already had - with semantic search.
For context, we serve over 1.3B searches per month on Typesense Cloud [3]
[1] https://github.com/typesense/typesense
[2] https://typesense.org/docs/0.25.0/api/vector-search.html
Dachande663|2 years ago
ZoomerCretin|2 years ago
jabo|2 years ago
Let's say your dataset has the words "Oceans are blue" in it.
With keyword search, if someone searches for "Ocean", they'll see that record, since it's a close match. But if they search for "sea" then that record won't be returned.
This is where semantic search comes in. It can automatically deduce semantic / conceptual relationships between words and return a record with "Ocean" even if the search term is "sea", because the two words are conceptually related.
The way semantic search works under the hood is using these things called embeddings, which are just a big array of floating point numbers for each record. It's an alternate way to represent words, in an N-dimensional space created by a machine learning model. Here's more information about embeddings: https://typesense.org/docs/0.25.0/api/vector-search.html#wha...
With the latest release, you essentially don't have to worry about embeddings (except may be picking one of the model names to use and experiment) and Typesense will do the semantic search for you by generating embeddings automatically.
mrjn|2 years ago
I'm surprised the original post doesn't benchmark Typesense.