top | item 37374317

(no title)

jkb79 | 2 years ago

It's an opinionated blog post published on Arxiv, masquerading as research.

IMHO, it's a gigantic self-own and doesn’t promote Lucene in a good way. For example, by demonstrating how they get only 10 QPS out of a system with 1TB of memory and 96 v-cpu's (after 4 warmups).

The HNSW implementation in Lucene is fair, and within the same order of magnitude as others. But, to get comparable performance, you must merge all immutable segments to a single segment, which all Lucene oriented benchmark does, but which is not that realistic for many production workloads where docs are updated/added in near real-time.

discuss

order

chii|2 years ago

> but which is not that realistic for many production workloads where docs are updated/added in near real-time.

it really depends on how real time you need the search to be tho.

What i've seen is a green/blue lucene index. The updates happen on one (let's say the blue), while searches happen on the other (green). The segment merging happens periodically for the blue (or even smarter, let's say, after some known amount of time and updates combined), and then the index are switched. Depending on how often new documents come in, and "real time" you need, this may be sufficient.