Typesense: Open-Source Alternative to Algolia

[+] Scarbutt|6 years ago|reply

What's a common approach for keeping the index up to date? A live ETL from the DB to the search engine doesn't sound simple. Another method I can think of, after existent data has been loaded, is to send the data directly at the same time to both the database and the search engine every time a user makes a CRUD operation but lots of works too if you don't already have a HTTP api and are doing mostly server-side-rendered HTML.

[+] ronlobo|6 years ago|reply

Pretty cool, how does it compare to the Rust pendant https://crates.meilisearch.com/

[+] tpayet|6 years ago|reply

Apart from being written in Rust, MeiliSearch (https://github.com/meilisearch/meilisearch) differs mostly on the use of a bucket sort to rank the documents retrieved within the index.

Both MeiliSearch and Typesense use a reverse index with a Levenshtein automaton to handle typos, but when it comes to sorting document:

- Typesense use a default_sorting_field on each document, it means that before indexing your documents you need to compute a relevancy score for typesense to be able to sort them based on your needs (https://typesense.org/docs/0.11.1/guide/#ranking-relevance)

- On the other hand MeiliSearch, uses a bucket sort which means that there is a default relevancy algorithm based on the proximity of words in the documents, the fields in which the words are found and the number of typos (https://docs.meilisearch.com/guides/advanced_guides/ranking....). And you can still add you own custom rules if you want to alter the default search behavior.

[+] ysleepy|6 years ago|reply

Looks really cool.

I have played around with lucene a lot and it seems typesense is a very close match to the feature set. - Apart from the REST interface on top.

Was the decision to not use the mature lucene platform technical? The memory and hardware requirements of lucene are quite small, even if Elastic or Solr leave a very different impression.

Glad to see a solution positioning itself a bit leaner than Solr/Elastic though, they really are a bit heavy for many occasions.

[+] karterk|6 years ago|reply

Yes, for typo correction + instant search, Lucene definitely is not fast enough on large datasets. There are also some limitations with fuzzy searching when you also want to sort/rank documents at the same time. Lucene is also a very generic mature library for a wider set of usecases.

[+] LrnByTeach|6 years ago|reply

good to know the memory efficiency !!!

> when 1 million Hacker News titles are indexed along with their points, Typesense consumes 165 MB of memory. The same size of that data on disk in JSON format is 88 MB.

I like the compact filter_by, sort_by with qualifiers

let searchParameters = { 'q' : 'harry', 'query_by' : 'title', 'filter_by' : 'publication_year:<1998', 'sort_by' : 'publication_year:desc' }

[+] drusepth|6 years ago|reply

I'm new to search libraries (frameworks?) but have been looking for something to use for a huge data dump I'm working with.

Storing everything in memory seems fast, but seems like it'd be quite the resource hog on a server -- is that a normal approach to take?

It's reassuring that the examples and documentation all revolve around books (as my data set is actually ~55 million books also), but since theirs seems to be quite the subset of that I worry about how well this scales and I don't know enough about search libs to even evaluate that.

Is there a good place to start learning about what kinds of situations Typesense works best in (besides needing a Levenshtein-based search), versus what kinds of situations it wouldn't work well in (and perhaps what other libraries would work better)?

[+] karterk|6 years ago|reply

Typesense's primary focus is speed and developer convenience. It makes an assumption (which is true for perhaps 99% of the time) that memory is cheap enough for indexing most datasets. Especially given the effort of development time and the benefits from a solid search user experience.

Other libraries like Elastic offer more customization but also has a steeper learning curve.

[+] KaoruAoiShiho|6 years ago|reply

Is it compatible with InstantSearch.js? or reactive search? https://github.com/appbaseio/reactivesearch

Talking about fastest time to market this is the biggest one rather than setting up elastic, which annoying as it is is still faster than creating the UI.

[+] jabo|6 years ago|reply

Not at the moment, but we have an equivalent integration planned shortly. Totally agree with you that building a search UI is still a pain.

[+] prayze|6 years ago|reply

A dream come true. Something I've been looking for, for a long time now. Thank you for sharing this

[+] tkfu|6 years ago|reply

Bit of a bad look that I can't search the docs using typesense.

[+] wiradikusuma|6 years ago|reply

There's also https://vespa.ai/ from (former) Yahoo, which I think knows a thing or two about search.

[+] ng7j5d9|6 years ago|reply

Any support for languages other than English?

Does it do normalization as part of the typo search (in case of missed/incorrent accent marks, etc)?

Does it do stemming at all? For English or other languages? (ie, I search for "run" and you show me documents for "running" or the other way around).

Any support for Chinese text (which typically doesn't have whitespace between words)?

[+] karterk|6 years ago|reply

We support English and other European languages (supports fuzzy search normalizing accented chars).

While it does not support stemming, with fuzzy prefix matching, it will largely work and practically more useful.

No typo or fuzzy correction for Chinese text yet.

[+] lxe|6 years ago|reply

It's written in C++, and the code is simple enough to skim. I would expect this to be some hefty Java thing.

[+] neurostimulant|6 years ago|reply

There is a bug on the demo search box in your home page, if no search results found (either due to empty string or no result found for the search term), it will display "undefined result. Page 1 of NaN"

[+] karterk|6 years ago|reply

Thanks, this has been fixed.

[+] ericcholis|6 years ago|reply

Looks great. One of Algolia's strongest features is InstantSearch for vanilla JS, React, Vue, Angular, iOS and Android. Hopefully there can be this level of support for Typesense

[+] SahAssar|6 years ago|reply

Instant search isn't that hard to build a small frontend for when you have the API though.

[+] karterk|6 years ago|reply

Definitely going to work on that.

[+] agentile|6 years ago|reply

Would love to see index restricted API keys comparable to https://www.algolia.com/doc/api-reference/api-methods/genera...

[+] azhenley|6 years ago|reply

Looks very nice. The readme and website demonstrate and explain it well, good job!

[+] beagle3|6 years ago|reply

Is it completely in memory? Seems like that from the readme

[+] ChrisCinelli|6 years ago|reply

Why is the hacker news title is mentioning "...alternative to Algolia" instead of other open source, self hosted solutions?

[+] veeralpatel979|6 years ago|reply

When I think in-app search, I usually think Algolia first but then ElasticSearch and Solr immediately after.

I think mentioning any of them would be okay.

[+] unknown|6 years ago|reply

[deleted]

[+] chocolatkey|6 years ago|reply

How well does this work for searching CJK (Chinese, Japanese, Korean) string fields?

[+] jacquesc|6 years ago|reply

Looks great! Anyone know if I can try this on Heroku somehow?

[+] jabo|6 years ago|reply

Not at the moment, but I've added this to our todo list. Looks like we need to write a custom buildpack.

54 comments