Manticore 6.0.0 – a faster alternative to Elasticsearch in C++

[+] boyter|3 years ago|reply

I highly recommend this software if its capabilities fall into what you need. It is very fast both in terms of indexing speed and search. It’s relatively simple to setup and start working against, and I have found it very reliable.

It works best when you have a SQL data store you want to index against, but with the real time index you can treat it more like elastic and other searches. However for that first use case of SQL, I don’t know of anything else that comes close to being as easy to use.

Simply point it at your database, give it a query to pull what you want to index and you are done. I suspect that this covers about 90% of use cases out there.

If you need more than what the DB native indexing is giving you give it a try.

[+] james_in_the_uk|3 years ago|reply

I used Manticore to implement full text search on a legal journal’s website. It has worked brilliantly for several years now. Recommended.

[+] mcronce|3 years ago|reply

This sounds... Legitimately like a perfect match for a use case I have. Thanks for the write up.

[+] zmmmmm|3 years ago|reply

> works best when you have a SQL data store you want to index against, but with the real time index you can treat it more like elastic and other searches

This sounds like a bit of a killer use case but even explicitly searching the documentation I can't find more than tease level information about it. They seem to be hyper focused on just presenting it as a replacement for ElasticSearch.

[+] entropyie|3 years ago|reply

See also this lightweight alternative to ES: https://github.com/zinclabs/zinc

[+] remram|3 years ago|reply

That looks great. I've been needing a multi-platform low-resource full text search engine/library.

[+] rzzzt|3 years ago|reply

"We are excited to announce the addition of telemetry in this release. [...] This feature can be easily turned off in the settings if desired."

How did they know in the previous 5 major versions which part of the product to improve?

[+] snikolaev|3 years ago|reply

We didn't, hence we may have inadvertently improved the wrong areas. Although we have been receiving feedback from the community, it was never backed by concrete data and thus, we had to rely on our best guess.

[+] 1vuio0pswjnm7|3 years ago|reply

They are "excited"? No end-user wants telemetry. They want software that works reliably. And when it doesn't work, they want customer service. Rarely do they get either.

If end-users wanted "telemetry", then they would have asked for it in previous versions.

Even once telemetry is added, there is still no reason to enable it by default. End-users that want/need to send data to the company, e.g., to back up their claims about deficiencies in the software, can easily enable it.

Show us the contractual restrictions that limit how the data sent by the end-user can be used. Show us the guarantee or enforceable promise that this data collection will result in improvements that end-users want (versus only benefitting the company in some undisclosed way(s)).

Why would anyone want to send behavioural data to a company with no enforceable promise of a benefit and no way to monitor how the data is used.

[+] dang|3 years ago|reply

[+] mooreds|3 years ago|reply

I looked for an up to date elasticsearch compatibility chart but was unable to find one.

I found an article from 2022 that did a compare/contrast but I wanted a feature by feature breakdown.

[+] caseyf|3 years ago|reply

I've been running Manticore (previously SphinxSearch) on a faceted search heavy site with a million MAUs for 15 years. I'd definitely use it again for another project.

If the data that you want to search is entirely contained in a SQL database, it's an uncomplicated and powerful solution, definitely check it out. If not, Manticore may still be a nice solution for you, but I can't speak to that.

[+] pQd|3 years ago|reply

coincidentally - we've been using first sphinx and then manticore for over 15 years as well. in our case it's fed each night with XML generated by Java code from data stored in MySQL databases. we index over 294M pseudo documents.

it's been rock solid for all those years.

[+] synergy20|3 years ago|reply

how does it compare to another c++ elasticsearch alternative that was on HN a few days ago: https://github.com/typesense/typesense

[+] snikolaev|3 years ago|reply

From the Typesense's site: "Typesense is an in-memory datastore", "If your dataset size is 1GB, you'd need between 2GB - 3GB RAM to hold the whole index in memory."

Manticore is different in terms of this, especially the Manticore columnar storage which doesn't require a significant portion of the data set to be stored in memory. This allows, for example, for a 1TB data set to be served on a standard server with some 32GB of RAM.

[+] idoubtit|3 years ago|reply

In my opinion, the main difference is the history: this search engine has been used for 2 decades in various production sites.

Mantiscore is an opensource fork of Sphinx Search, which released its first version in 2001. The fork started after the latter went from opensource to proprietary, at the end of 2017. The engine is stable and battle-tested. IIRC, Craiglist uses Sphinx.

[+] AlexAltea|3 years ago|reply

Related: Meilisearch v1.0.0 release two days ago: https://news.ycombinator.com/item?id=34707727

I have been following these two libraries (Manticore and Meilisearch) very closely. Their simplicity, portability and performance gains over Elasticsearch are impressive.

Since two days ago, I am creating Python bindings for the core search engine of each of these two libraries, starting with https://github.com/AlexAltea/milli-py. Getting extreme performance, but as an embedded/self-contained package (basically same goals as SQLite).

[+] arein3|3 years ago|reply

Regarding performance, hope it's not the same as Graphana's Loki.

Grapana Loki advertises lower resource requirement, but it's just a disk storage system. Any query will read everyrhing from disk.

The Elasticsearch has big RAM requirements if you create a lot of indexes of course. You can't have something more quick than indexes, and you can't have lower resource requirements without having fewer indexes.

[+] mardix|3 years ago|reply

Loving it. I'm interested in milli-py.

What can be a cool feature, it's auto backup to S3, or load from S3.

[+] canadiantim|3 years ago|reply

That looks awesome, kudos! I've been looking for a way to do local-first high-quality FTS.

[+] m3affan|3 years ago|reply

I wonder how lasting will the support be for such libraries

[+] ollybee|3 years ago|reply

Give Xapian a go also.

[+] antman|3 years ago|reply

github 404 fyi

[+] quijoteuniv|3 years ago|reply

What is people looking for in alternatives for elastic search? I have been toying in docker with a version of elastic search, fscrawler and workplace search to get a company to have better access to their knowdledge base. They have exchange, manuals, emails,images& video github and other stuff… does this alternatives have connectors too? Any experience on this?

[+] snikolaev|3 years ago|reply

There are several reasons why some people prefer alternatives to Elasticsearch, including:

* License preference: Some people prefer true open-source licenses as opposed to the license that Elasticsearch has switched to.

* Performance and resource consumption: For some, performance and resource consumption are significant factors in their choice of a search engine.

* SQL vs JSON DSL: Some people prefer using SQL over Elasticsearch's JSON domain-specific language.

* Maintenance: Some believe that maintaining Elasticsearch can become challenging when the data collection becomes large enough.

That's what I've heard from those who preferred Manticore over Elasticsearch.

[+] davewritescode|3 years ago|reply

Anytime I see an alternative to Elastic search on HN my first thought is how much of a shame it is to use something other than Lucene for text search because of just how powerful it really is.

Elasticsearch is a pain to tune and partition, and the JVM brings a whole set of operational issues but what's the point of better read/write performance when the actual search performance is worse?

I guess this makes sense for use cases where you care more about speed than the quality of results.

[+] snikolaev|3 years ago|reply

When the search performance is worse - may be no sense. Regarding Manticore, we conducted relevance tests and found it to be on par with Elasticsearch. In fact, the objective tests [1] showed that Manticore can even provide better relevance results than Elasticsearch, when using almost default settings. You can view the relevant pull request in the BEIR information retrieval benchmark [2].

[1] https://docs.google.com/spreadsheets/d/1_ZyYkPJ_K0st9FJBrjbZ...

[2] https://github.com/beir-cellar/beir/pull/92

[+] idoubtit|3 years ago|reply

Do you have any sources when you claim that Lucene is the best search engine because it is much more "powerful" to the point it's a "shame" to use anything else, and because every engine "actual search performance is worse" than Lucene?

This is a very strong claim, and without strong arguments, it's a ridiculous claim.

[+] trallnag|3 years ago|reply

Time to rewrite it in Rust

[+] xeraa|3 years ago|reply

> You can now execute Elasticsearch-compatible insert and replace JSON queries, which enables the use of Manticore with tools such as Logstash and Filebeat

Looking at the docs I could only see _create and _doc but not _bulk endpoint support. How will that work with Logstash and Filebeat?

[+] snikolaev|3 years ago|reply

Please take a look at the 'Elasticsearch' tab on this page https://manual.manticoresearch.com/Data_creation_and_modific...

[+] juxtaposicion|3 years ago|reply

How does this compare to Quickwit or other Tantivy-powered engines?

[+] remram|3 years ago|reply

I think quickwit is more tailored towards querying large indexes sitting on S3 than fast queries from a local or in-memory index.

lnx might be similar, I'm not sure. It's very new and I had a bad experience trying it out.

[+] riku_iki|3 years ago|reply

Does it have distributed partitioning like es?..

[+] snikolaev|3 years ago|reply

Yes it does https://manual.manticoresearch.com/Searching/Distributed_sea...

[+] Multrex|3 years ago|reply

Any chance to use it with Graylog instead of Elasticsearch or Opensearch?

[+] snikolaev|3 years ago|reply

Good idea, but we'd need to check. We have little experience with Graylog, so I'm not even sure if you can easily replace Elasticsearch/Opensearch with smth else in it.

[+] comrad|3 years ago|reply

Just because it is in C++ it is supposed to be faster? I highly doubt that.

[+] Minor49er|3 years ago|reply

I don't see anywhere where they claim that it's faster simply because it's written in C++. They do mention that they make use of C++ to add low level optimizations that make queries faster and the memory imprint smaller, but any claims about performance in the readme are linked to benchmarks to back up their claims

https://github.com/manticoresoftware/manticoresearch/

https://db-benchmarks.com/test-taxi/#manticore-search-vs-ela...

[+] snikolaev|3 years ago|reply

It is unlikely that it is because of C++, however, we have conducted extensive benchmarking (which, by the way, is fully open-source and can be easily reproduced if desired). You can find more information about this at https://manticoresearch.com/blog/manticore-alternative-to-el....

[+] Shorel|3 years ago|reply

No, just because it's in C++ does not mean it is automatically faster.

However, with good enough algorithms and judicious coding and memory management, the possibility exists.

[+] alphanullmeric|3 years ago|reply

Someone tell the rust people

[+] janmo|3 years ago|reply

It is, and also it is the way c/c++ makes you write code.

Languages such as Java or PHP make you lazy and you end up using the string variable type a lot. It is extremely inefficient.

69 comments