top | item 42497271

(no title)

generall | 1 year ago

IVF, unfortunately, is barely compatible with filtered search. It have to rely on post-filtering and retrieve more and more candidates if the result set is not big enough. If the query is in some form correlated with the filter, this approach quickly degrades to brute-force.

Surprised that the article doesn't mention filtering use-case at all.

discuss

VoVAllen|1 year ago

Hi, I'm the author of the article. I actually think the opposite of what you mentioned. IVF is more suitable for prefiltering compared to HNSW. For prefiltering in HNSW, there is a certain requirement for the filtering rate—it can't be too low, or the entire graph may become disconnected. For instance, with the commonly used parameter m=16, each node can have at most 16 neighbors. If the filtering rate is below 5%, it can directly result in no neighbors meeting the condition, causing the algorithm to fail entirely. This is why the academic community has proposed alternatives like ACRON[1]. On the other hand, IVF doesn't have this problem at all—you can check whether a candidate meets the condition before calculating distances.

[1] https://arxiv.org/abs/2403.04871

generall|1 year ago

In IVF you can start checking conditions only in the final bucket. There are no guarantees if the bucket has any acceptable value, and there are no procedures to find the bucket which has acceptable values before scanning it.