"The obstacle is that until now, off-the-shelf vector databases could not index a dataset larger than memory, because both the full-resolution vectors and the index (edge list) needed to be kept in memory during index construction. Larger datasets could be split into segments, but this means that at query time they need to search each segment separately, then combine the results, turning an O(log N) search per segment into O(N) overall."How is a log N search over S segments O(N)?
jbellis|1 year ago
HammadB|1 year ago
To be more correct it’s O(N/C log C) where C is the capacity of a segment. In this case you can ignore 1/C and log C as constant. So now sure, you actually just have O(N). But this is not super useful as it says that a segmented hnsw approach and brute force approach are the same - when this is really not the case in practice.
Also O(N log N) > O(N) so I’m not sure why we would ever do anything with segmentation according to that analysis if it were correct.
unknown|1 year ago
[deleted]
thfuran|1 year ago