top | item 34741018

Manticore 6.0.0 – a faster alternative to Elasticsearch in C++

193 points| snikolaev | 3 years ago |manticoresearch.com | reply

69 comments

order
[+] boyter|3 years ago|reply
I highly recommend this software if its capabilities fall into what you need. It is very fast both in terms of indexing speed and search. It’s relatively simple to setup and start working against, and I have found it very reliable.

It works best when you have a SQL data store you want to index against, but with the real time index you can treat it more like elastic and other searches. However for that first use case of SQL, I don’t know of anything else that comes close to being as easy to use.

Simply point it at your database, give it a query to pull what you want to index and you are done. I suspect that this covers about 90% of use cases out there.

If you need more than what the DB native indexing is giving you give it a try.

[+] james_in_the_uk|3 years ago|reply
I used Manticore to implement full text search on a legal journal’s website. It has worked brilliantly for several years now. Recommended.
[+] mcronce|3 years ago|reply
This sounds... Legitimately like a perfect match for a use case I have. Thanks for the write up.
[+] zmmmmm|3 years ago|reply
> works best when you have a SQL data store you want to index against, but with the real time index you can treat it more like elastic and other searches

This sounds like a bit of a killer use case but even explicitly searching the documentation I can't find more than tease level information about it. They seem to be hyper focused on just presenting it as a replacement for ElasticSearch.

[+] rzzzt|3 years ago|reply
"We are excited to announce the addition of telemetry in this release. [...] This feature can be easily turned off in the settings if desired."

How did they know in the previous 5 major versions which part of the product to improve?

[+] snikolaev|3 years ago|reply
We didn't, hence we may have inadvertently improved the wrong areas. Although we have been receiving feedback from the community, it was never backed by concrete data and thus, we had to rely on our best guess.
[+] 1vuio0pswjnm7|3 years ago|reply
They are "excited"? No end-user wants telemetry. They want software that works reliably. And when it doesn't work, they want customer service. Rarely do they get either.

If end-users wanted "telemetry", then they would have asked for it in previous versions.

Even once telemetry is added, there is still no reason to enable it by default. End-users that want/need to send data to the company, e.g., to back up their claims about deficiencies in the software, can easily enable it.

Show us the contractual restrictions that limit how the data sent by the end-user can be used. Show us the guarantee or enforceable promise that this data collection will result in improvements that end-users want (versus only benefitting the company in some undisclosed way(s)).

Why would anyone want to send behavioural data to a company with no enforceable promise of a benefit and no way to monitor how the data is used.

[+] mooreds|3 years ago|reply
I looked for an up to date elasticsearch compatibility chart but was unable to find one.

I found an article from 2022 that did a compare/contrast but I wanted a feature by feature breakdown.

[+] caseyf|3 years ago|reply
I've been running Manticore (previously SphinxSearch) on a faceted search heavy site with a million MAUs for 15 years. I'd definitely use it again for another project.

If the data that you want to search is entirely contained in a SQL database, it's an uncomplicated and powerful solution, definitely check it out. If not, Manticore may still be a nice solution for you, but I can't speak to that.

[+] pQd|3 years ago|reply
coincidentally - we've been using first sphinx and then manticore for over 15 years as well. in our case it's fed each night with XML generated by Java code from data stored in MySQL databases. we index over 294M pseudo documents.

it's been rock solid for all those years.

[+] synergy20|3 years ago|reply
how does it compare to another c++ elasticsearch alternative that was on HN a few days ago: https://github.com/typesense/typesense
[+] snikolaev|3 years ago|reply
From the Typesense's site: "Typesense is an in-memory datastore", "If your dataset size is 1GB, you'd need between 2GB - 3GB RAM to hold the whole index in memory."

Manticore is different in terms of this, especially the Manticore columnar storage which doesn't require a significant portion of the data set to be stored in memory. This allows, for example, for a 1TB data set to be served on a standard server with some 32GB of RAM.

[+] idoubtit|3 years ago|reply
In my opinion, the main difference is the history: this search engine has been used for 2 decades in various production sites.

Mantiscore is an opensource fork of Sphinx Search, which released its first version in 2001. The fork started after the latter went from opensource to proprietary, at the end of 2017. The engine is stable and battle-tested. IIRC, Craiglist uses Sphinx.

[+] AlexAltea|3 years ago|reply
Related: Meilisearch v1.0.0 release two days ago: https://news.ycombinator.com/item?id=34707727

I have been following these two libraries (Manticore and Meilisearch) very closely. Their simplicity, portability and performance gains over Elasticsearch are impressive.

Since two days ago, I am creating Python bindings for the core search engine of each of these two libraries, starting with https://github.com/AlexAltea/milli-py. Getting extreme performance, but as an embedded/self-contained package (basically same goals as SQLite).

[+] arein3|3 years ago|reply
Regarding performance, hope it's not the same as Graphana's Loki.

Grapana Loki advertises lower resource requirement, but it's just a disk storage system. Any query will read everyrhing from disk.

The Elasticsearch has big RAM requirements if you create a lot of indexes of course. You can't have something more quick than indexes, and you can't have lower resource requirements without having fewer indexes.

[+] mardix|3 years ago|reply
Loving it. I'm interested in milli-py.

What can be a cool feature, it's auto backup to S3, or load from S3.

[+] canadiantim|3 years ago|reply
That looks awesome, kudos! I've been looking for a way to do local-first high-quality FTS.
[+] m3affan|3 years ago|reply
I wonder how lasting will the support be for such libraries
[+] ollybee|3 years ago|reply
Give Xapian a go also.
[+] antman|3 years ago|reply
github 404 fyi
[+] quijoteuniv|3 years ago|reply
What is people looking for in alternatives for elastic search? I have been toying in docker with a version of elastic search, fscrawler and workplace search to get a company to have better access to their knowdledge base. They have exchange, manuals, emails,images& video github and other stuff… does this alternatives have connectors too? Any experience on this?
[+] snikolaev|3 years ago|reply
There are several reasons why some people prefer alternatives to Elasticsearch, including:

* License preference: Some people prefer true open-source licenses as opposed to the license that Elasticsearch has switched to.

* Performance and resource consumption: For some, performance and resource consumption are significant factors in their choice of a search engine.

* SQL vs JSON DSL: Some people prefer using SQL over Elasticsearch's JSON domain-specific language.

* Maintenance: Some believe that maintaining Elasticsearch can become challenging when the data collection becomes large enough.

That's what I've heard from those who preferred Manticore over Elasticsearch.

[+] davewritescode|3 years ago|reply
Anytime I see an alternative to Elastic search on HN my first thought is how much of a shame it is to use something other than Lucene for text search because of just how powerful it really is.

Elasticsearch is a pain to tune and partition, and the JVM brings a whole set of operational issues but what's the point of better read/write performance when the actual search performance is worse?

I guess this makes sense for use cases where you care more about speed than the quality of results.

[+] snikolaev|3 years ago|reply
When the search performance is worse - may be no sense. Regarding Manticore, we conducted relevance tests and found it to be on par with Elasticsearch. In fact, the objective tests [1] showed that Manticore can even provide better relevance results than Elasticsearch, when using almost default settings. You can view the relevant pull request in the BEIR information retrieval benchmark [2].

[1] https://docs.google.com/spreadsheets/d/1_ZyYkPJ_K0st9FJBrjbZ...

[2] https://github.com/beir-cellar/beir/pull/92

[+] idoubtit|3 years ago|reply
Do you have any sources when you claim that Lucene is the best search engine because it is much more "powerful" to the point it's a "shame" to use anything else, and because every engine "actual search performance is worse" than Lucene?

This is a very strong claim, and without strong arguments, it's a ridiculous claim.

[+] xeraa|3 years ago|reply
> You can now execute Elasticsearch-compatible insert and replace JSON queries, which enables the use of Manticore with tools such as Logstash and Filebeat

Looking at the docs I could only see _create and _doc but not _bulk endpoint support. How will that work with Logstash and Filebeat?

[+] juxtaposicion|3 years ago|reply
How does this compare to Quickwit or other Tantivy-powered engines?
[+] remram|3 years ago|reply
I think quickwit is more tailored towards querying large indexes sitting on S3 than fast queries from a local or in-memory index.

lnx might be similar, I'm not sure. It's very new and I had a bad experience trying it out.

[+] Multrex|3 years ago|reply
Any chance to use it with Graylog instead of Elasticsearch or Opensearch?
[+] snikolaev|3 years ago|reply
Good idea, but we'd need to check. We have little experience with Graylog, so I'm not even sure if you can easily replace Elasticsearch/Opensearch with smth else in it.
[+] comrad|3 years ago|reply
Just because it is in C++ it is supposed to be faster? I highly doubt that.
[+] Shorel|3 years ago|reply
No, just because it's in C++ does not mean it is automatically faster.

However, with good enough algorithms and judicious coding and memory management, the possibility exists.

[+] janmo|3 years ago|reply
It is, and also it is the way c/c++ makes you write code.

Languages such as Java or PHP make you lazy and you end up using the string variable type a lot. It is extremely inefficient.